Data Visualization

Statistics is a mathematical science including methods of collecting, organizing and analyzing data in such a way that meaningful conclusions can be drawn from them. Data can be defined as groups of information that represent the qualitative or quantitative attributes of a variable or set of variables. In layman’s terms, data in statistics can be any set of information that describes a given entity. An example of data can be the ages of the students in a given class. When you collect those ages, that becomes your data.
About this Course
The Data Visualization Training Module will give the reader a thorough introduction to Data Science, Statistics, R, IBM Watson Studio and python using real life examples.
- Learn basic concepts such as the mean, median etc.
- Learn practical implementation statistical concepts using R, IBM Watson Studio and python.
Course Syllabus
Module 1: From Problem to Approach
- Business Understanding – Concepts & Case Study
- Analytic Approach – Concepts & Case Study
Introduction to Statistics
- Introduction to Statistics
- Difference between inferential statistics and descriptive statistics
Inferential Statistics
- Drawing Inferences from Data
- Random Variables
- Normal Probability Distribution
- Sampling
- Sample Statistics and Sampling Distributions
R overview and Installation
- Overview and About R
- R and R studio Installation
Descriptive Data analysis using R
- Description of basic functions used to describe data in R
Data manipulation with R
- Introduction to dplyr (filter, select, arrange, mutate, summarize)
- Introduction to data.table
- Introduction to reshape2 package
- Introduction to tidyr package
- Introduction to Lubridate package
Data visualization with R
- Working with Base R Graphics (Scatter Plot, Bar Plot, and Histogram)
- Working with ggplot2
- Data visualization in Watson Studio
- Adding data to data refinery
- Visualization of Data on Watson Studio
Introduction to Python
- Python and Anaconda Installation
- Introduction to Jupyter Notebook
- Python scripting basics
- Numpy and Pandas
- Numpy overview – Creating and Accessing Numpy Arrays
- Introduction to pandas
- Pandas read and write csv
- Descriptive statistics using pandas
- Pandas working with text data and datetime columns
- Pandas Indexing and selecting data
- Pandas – groupby
- Merge / Join datasets
Introduction to Data Visualization Tools in Python
- Introduction to Matplotlib
- Read a CSV and Generate a line plot with matplotlib
Basic plots using matplotib
- Area Plots
- Bar Charts
- Histograms
Specialized Visualization Tools using Matplotlib
- Pie Charts
- Box Plots
- Scatter Plots
- Bubble Plots
Advanced Visualization Tools using Matplotlib
- Waffle Charts
- Word Clouds
Introduction to Seaborn
- Seaborn functionalities and usage with Hands-on
Spatial Visualizations and Analysis in Python with Folium
- Introduction to Folium
- Case Study (Analyze New York City Taxi Trip Ride Data Set to Identify best locations for taxi stops)
Recommended skills prior to taking this course
- Basic knowledge of Python
Requirements
- Basic knowledge of Python
