Data Science for Scala

Spark-Scala1

Apache Spark™ is a fast and general engine for large-scale data processing, with built-in modules for streaming, SQL, machine learning and graph processing. This course shows how to use Spark’s machine learning pipelines to fit models and search for optimal hyperparameters using a Spark cluster.

About this Scala Course

In this course you will learn about Basic statistics and data types, Preparing data, Feature engineering, Fitting a model and Pipelines and grid search.  Apache Spark™ is a fast and general engine for large-scale data processing, with built-in modules for streaming, machine learning and graph processing. This course shows you how to use Spark’s machine learning pipelines to fit models and search for optimal hyperparameters using a Spark cluster.

Course Syllabus

Module 1 – Basic Statistics and Data Types

  • Vectors and Labelled Points
  • Local and Distributed Matrices
  • Summary Statistics, Correlations, and Random Data
  • Sampling
  • Hypothesis Testing

Module 2 – Preparing Data

  • Statistics, Random data and Sampling on Data Frames
  • Handling Missing Data and Imputing Values
  • Transformers and Estimators
  • Data Normalization
  • Identifying Outliers

Module 3 – Feature Engineering

  • Feature Vectors
  • Categorical Features
  • Using Explode, User Defined Functions, and Pivot
  • Principal Component Analysis (PCA) in Feature Engineering
  • RFormulas

Module 4 – Fitting a Model

  • Decision Trees
  • Random Forests
  • Gradient-Boosting Trees
  • Linear Methods
  • Evaluation

Module 5 – Pipeline and Grid Search

  • Predicting Grant Applications: Introduction
  • Predicting Grant Applications: Creating Features
  • Predicting Grant Applications: Building a Pipeline
  • Prediciting Grant Applications: Cross Validation and Model Tuning
  • Predicting Grant Applications: Wrapping up

GENERAL INFORMATION

  • This course is self-paced.
  • It can be taken at any time.
  • It can be audited as many times as you wish.
  • There is only ONE chance to pass the course, but multiple attempts per question

RECOMMENDED SKILLS PRIOR TO TAKING THIS COURSE

  • General understanding of Scala Experience with Java (preferred)
  • Python, or another object­ oriented language
  • General understanding of machine learning
Ibm Certification Courses