Data Science for Scala

Apache Spark™ is a fast and general engine for large-scale data processing, with built-in modules for streaming, SQL, machine learning and graph processing. This course shows how to use Spark’s machine learning pipelines to fit models and search for optimal hyperparameters using a Spark cluster.
About this Scala Course
In this course you will learn about Basic statistics and data types, Preparing data, Feature engineering, Fitting a model and Pipelines and grid search. Apache Spark™ is a fast and general engine for large-scale data processing, with built-in modules for streaming, machine learning and graph processing. This course shows you how to use Spark’s machine learning pipelines to fit models and search for optimal hyperparameters using a Spark cluster.
Course Syllabus
Module 1 – Basic Statistics and Data Types
- Vectors and Labelled Points
- Local and Distributed Matrices
- Summary Statistics, Correlations, and Random Data
- Sampling
- Hypothesis Testing
Module 2 – Preparing Data
- Statistics, Random data and Sampling on Data Frames
- Handling Missing Data and Imputing Values
- Transformers and Estimators
- Data Normalization
- Identifying Outliers
Module 3 – Feature Engineering
- Feature Vectors
- Categorical Features
- Using Explode, User Defined Functions, and Pivot
- Principal Component Analysis (PCA) in Feature Engineering
- RFormulas
Module 4 – Fitting a Model
- Decision Trees
- Random Forests
- Gradient-Boosting Trees
- Linear Methods
- Evaluation
Module 5 – Pipeline and Grid Search
- Predicting Grant Applications: Introduction
- Predicting Grant Applications: Creating Features
- Predicting Grant Applications: Building a Pipeline
- Prediciting Grant Applications: Cross Validation and Model Tuning
- Predicting Grant Applications: Wrapping up
GENERAL INFORMATION
- This course is self-paced.
- It can be taken at any time.
- It can be audited as many times as you wish.
- There is only ONE chance to pass the course, but multiple attempts per question
RECOMMENDED SKILLS PRIOR TO TAKING THIS COURSE
- General understanding of Scala Experience with Java (preferred)
- Python, or another object oriented language
- General understanding of machine learning
