# Data Visualization

Statistics is a mathematical science including methods of collecting, organizing and analyzing data in such a way that meaningful conclusions can be drawn from them. Data can be defined as groups of information that represent the qualitative or quantitative attributes of a variable or set of variables. In layman’s terms, data in statistics can be any set of information that describes a given entity. An example of data can be the ages of the students in a given class. When you collect those ages, that becomes your data.

The Data Visualization Training Module will give the reader a thorough introduction to Data Science, Statistics, R, IBM Watson Studio and python using real life examples.

• Learn basic concepts such as the mean, median etc.
• Learn practical implementation statistical concepts using R, IBM Watson Studio and python.

#### Course Syllabus

Module 1: From Problem to Approach

• Business Understanding – Concepts & Case Study
• Analytic Approach – Concepts & Case Study

Introduction to Statistics

• Introduction to Statistics
• Difference between inferential statistics and descriptive statistics

Inferential Statistics

• Drawing Inferences from Data
• Random Variables
• Normal Probability Distribution
• Sampling
• Sample Statistics and Sampling Distributions

R overview and Installation

• R and R studio Installation

Descriptive Data analysis using R

• Description of basic functions used to describe data in R

Data manipulation with R

• Introduction to dplyr (filter, select, arrange, mutate, summarize)
• Introduction to data.table
• Introduction to reshape2 package
• Introduction to tidyr package
• Introduction to Lubridate package

Data visualization with R

• Working with Base R Graphics (Scatter Plot, Bar Plot, and Histogram)
• Working with ggplot2
• Data visualization in Watson Studio
• Adding data to data refinery
• Visualization of Data on Watson Studio

Introduction to Python

• Python and Anaconda Installation
• Introduction to Jupyter Notebook
• Python scripting basics
• Numpy and Pandas
• Numpy overview – Creating and Accessing Numpy Arrays
• Introduction to pandas
• Pandas read and write csv
• Descriptive statistics using pandas
• Pandas working with text data and datetime columns
• Pandas Indexing and selecting data
• Pandas – groupby
• Merge / Join datasets

Introduction to Data Visualization Tools in Python

• Introduction to Matplotlib
• Read a CSV and Generate a line plot with matplotlib

Basic plots using matplotib

• Area Plots
• Bar Charts
• Histograms

Specialized Visualization Tools using Matplotlib

• Pie Charts
• Box Plots
• Scatter Plots
• Bubble Plots

• Waffle Charts
• Word Clouds

Introduction to Seaborn

• Seaborn functionalities and usage with Hands-on

Spatial Visualizations and Analysis in Python with Folium

• Introduction to Folium
• Case Study (Analyze New York City Taxi Trip Ride Data Set to Identify best locations for taxi stops)

### Recommended skills prior to taking this course

• Basic knowledge of Python

### Requirements

• Basic knowledge of Python