Data Visualization

Spark-Scala1

Statistics is a mathematical science including methods of collecting, organizing and analyzing data in such a way that meaningful conclusions can be drawn from them. Data can be defined as groups of information that represent the qualitative or quantitative attributes of a variable or set of variables. In layman’s terms, data in statistics can be any set of information that describes a given entity. An example of data can be the ages of the students in a given class. When you collect those ages, that becomes your data.

About this Course

The Data Visualization Training Module will give the reader a thorough introduction to Data Science, Statistics, R, IBM Watson Studio and python using real life examples.

  • Learn basic concepts such as the mean, median etc.
  • Learn practical implementation statistical concepts using R, IBM Watson Studio and python.

Course Syllabus

Module 1: From Problem to Approach

  • Business Understanding – Concepts & Case Study
  • Analytic Approach – Concepts & Case Study

Introduction to Statistics

  • Introduction to Statistics
  • Difference between inferential statistics and descriptive statistics

Inferential Statistics

  • Drawing Inferences from Data
  • Random Variables
  • Normal Probability Distribution
  • Sampling
  • Sample Statistics and Sampling Distributions

R overview and Installation

  • Overview and About R
  • R and R studio Installation

Descriptive Data analysis using R

  • Description of basic functions used to describe data in R

Data manipulation with R

  • Introduction to dplyr (filter, select, arrange, mutate, summarize)
  • Introduction to data.table
  • Introduction to reshape2 package
  • Introduction to tidyr package
  • Introduction to Lubridate package

Data visualization with R

  • Working with Base R Graphics (Scatter Plot, Bar Plot, and Histogram)
  • Working with ggplot2
  • Data visualization in Watson Studio
  • Adding data to data refinery
  • Visualization of Data on Watson Studio

Introduction to Python

  • Python and Anaconda Installation
  • Introduction to Jupyter Notebook
  • Python scripting basics
  • Numpy and Pandas
  • Numpy overview – Creating and Accessing Numpy Arrays
  • Introduction to pandas
  • Pandas read and write csv
  • Descriptive statistics using pandas
  • Pandas working with text data and datetime columns
  • Pandas Indexing and selecting data
  • Pandas – groupby
  • Merge / Join datasets

Introduction to Data Visualization Tools in Python

  • Introduction to Matplotlib
  • Read a CSV and Generate a line plot with matplotlib

Basic plots using matplotib

  • Area Plots
  • Bar Charts
  • Histograms

Specialized Visualization Tools using Matplotlib

  • Pie Charts
  • Box Plots
  • Scatter Plots
  • Bubble Plots

Advanced Visualization Tools using Matplotlib

  • Waffle Charts
  • Word Clouds

Introduction to Seaborn

  • Seaborn functionalities and usage with Hands-on

Spatial Visualizations and Analysis in Python with Folium

  • Introduction to Folium
  • Case Study (Analyze New York City Taxi Trip Ride Data Set to Identify best locations for taxi stops)

Recommended skills prior to taking this course

  • Basic knowledge of Python

Requirements

  • Basic knowledge of Python
Ibm Certification Courses