# Data Science Course Syllabus

Introduction to Data Science

• Introduction to Data Analytics
• Data types and data Models
• Evolution of Analytics
• Data Science Components
• Data Scientist Skillset
• Univariate Data Analysis
• Introduction to Sampling

Basic Operations in R Programming

• Introduction to R programming
• Types of Objects in R
• Naming standards in R
• Creating Objects in R
• Data Structure in R
• Matrix, Data Frame, String, Vectors
• Understanding Vectors & Data input in R
• Lists, Data Elements
• Creating Data Files using R

Data Handling in R Programming

• Basic Operations in R – Expressions, Constant Values, Arithmetic, Function Calls, Symbols
• Sub-setting Data
• Selecting (Keeping) Variables
• Excluding (Dropping) Variables
• Selecting Observations and Selection using Subset Function
• Merging Data
• Sorting Data
• Visualization using R
• Data Type Conversion
• Built-In Numeric Functions
• Built-In Character Functions
• User Built Functions
• Control Structures
• Loop Functions

Introduction to Statistics

• Basic Statistics
• Measure of central tendency
• Types of Distributions
• Anova
• F-Test
• Central Limit Theorem & applications
• Types of variables
• Relationships between variables
• Central Tendency
• Measures of Central Tendency
• Kurtosis
• Skewness
• Arithmetic Mean / Average
• Merits & Demerits of Arithmetic Mean
• Mode, Merits & Demerits of Mode
• Median, Merits & Demerits of Median
• Range
• Concept of Quantiles, Quartiles, percentile
• Standard Deviation
• Variance
• Calculate Variance
• Covariance
• Correlation

Introduction to Statistics – 2

• Hypothesis Testing
• Multiple Linear Regression
• Logistic Regression
• Clustering (Hierarchical Clustering & K-means Clustering)
• Classification (Decision Trees)
• Time Series Analysis (Simple Moving Average, Exponential smoothing, ARIMA+)

Introduction to Probability

• Standard Normal Distribution
• Normal Distribution
• Geometric Distribution
• Poisson Distribution
• Binomial Distribution
• Parameters vs. Statistics
• Probability Mass Function
• Random Variable
• Conditional Probability and Independence
• Unions and Intersections
• Finding Probability of dataset
• Probability Terminology
• Probability Distributions

Data Visualization Techniques

• Bubble Chart
• Sparklines
• Waterfall chart
• Box Plot
• Line Charts
• Frequency Chart
• Bimodal & Multimodal Histograms
• Histograms
• Scatter Plot
• Pie Chart
• Bar Graph
• Line Graph

Introduction to Machine Learning

• Overview & Terminologies
• What is Machine Learning?
• Why Learn?
• When is Learning required?
• Data Mining
• Application Areas and Roles
• Types of Machine Learning
• Supervised Learning
• Unsupervised Learning
• Reinforcement learning

Machine Learning Concepts & Terminologies

Steps in developing a Machine Learning application

• Key tasks of Machine Learning
• Modelling Terminologies
• Learning a Class from Examples
• Probability and Inference
• PAC (Probably Approximately Correct) Learning
• Noise
• Noise and Model Complexity
• Association Rules
• Association Measures

Regression Techniques

• Concept of Regression
• Best Fitting line
• Simple Linear Regression
• Building regression models using excel
• Coefficient of determination (R- Squared)
• Multiple Linear Regression
• Assumptions of Linear Regression
• Variable transformation
• Multicollinearity
• VIF
• Methods of building Linear regression model in R
• Model validation techniques
• Cooks Distance
• Q-Q Plot
• Durbin- Watson Test
• Kolmogorov-Smirnof Test
• Homoskedasticity of error terms
• Logistic Regression
• Applications of logistic regression
• Concept of odds
• Concept of Odds Ratio
• Derivation of logistic regression equation
• Interpretation of logistic regression output
• Model building for logistic regression
• Model validations
• Confusion Matrix
• Concept of ROC/AOC Curve
• KS Test

• Applications of Market Basket Analysis
• What is association Rules
• Overview of Apriori algorithm
• Key terminologies in MBA
• Support
• Confidence
• Lift
• Model building for MBA
• Transforming sales data to suit MBA
• MBA Rule selection
• Ensemble modelling applications using MBA

Time Series Analysis (Forecasting)

• Model building using ARIMA, ARIMAX, SARIMAX
• Data De-trending & data differencing
• KPSS Test
• Dickey Fuller Test
• Concept of stationarity
• Model building using exponential smoothing
• Model building using simple moving average
• Time series analysis techniques
• Components of time series
• Prerequisites for time series analysis
• Concept of Time series data
• Applications of Forecasting

Decision Trees using R

• Understanding the Concept
• Internal decision nodes
• Terminal leaves.
• Tree induction: Construction of the tree
• Classification Trees
• Entropy
• Selecting Attribute
• Information Gain
• Partially learned tree
• Overfitting
• Causes for over fitting
• Overfitting Prevention (Pruning) Methods
• Reduced Error Pruning
• Decision trees – Advantages & Drawbacks
• Ensemble Models

K Means Clustering

• Parametric Methods Recap
• Clustering
• Direct Clustering Method
• Mixture densities
• Classes v/s Clusters
• Hierarchical Clustering
• Dendogram interpretation
• Non-Hierarchical Clustering
• K-Means
• Distance Metrics
• K-Means Algorithm
• K-Means Objective
• Color Quantization
• Vector Quantization

Tableau Analytics

• Tableau Introduction
• Data connection to Tableau
• Calculated fields, hierarchy, parameters, sets, groups in Tableau
• Various visualizations Techniques in Tableau
• Map based visualization using Tableau
• Reference Lines
• Adding Totals, sub totals, Captions
• Using Combined Field
• Show Filter & Use various filter options
• Data Sorting
• Create Combined Field
• Table Calculations
• Creating Tableau Dashboard
• Action Filters
• Creating Story using Tableau

Analytics using Tableau

• Clustering using Tableau
• Time series analysis using Tableau
• Simple Linear Regression using Tableau

R integration in Tableau

• Integrating R code with Tableau
• Creating statistical model with dynamic inputs
• Visualizing R output in Tableau
• Case Study 1- Real time project with Twitter Data Analytics
• Case Study 2- Real time project with Google Finance
• Case Study 3- Real time project with IMDB Website

If you want to Learn Data Science Training in Chennai, Please reach us at +91 86818 84318