Brief Overview of a 360-in-525 Minutes Course Set

For more details see Overview of a 360-in-525 Minutes Course Set in Data Sciences, Spring 2018

360-in-525-1: Introduction to Apache Spark for Data Scientists

This is a one-full-day workshop (1 hp) on April 20 2018 on Apache Spark, one of the most widely used open-source and commercially friendly software for analysing big data in industry and academia. A crash course in Scala, the language of Apache Spark, will be followed by introduction to resilient distributed datasets (RDDs), their transformations and actions, Spark DataSets and DataFrames, SparkSQL. We will have brief teasers on ML Pipelines, Streaming and GraphX as they will be covered in-depth in the sequel modules (concepts will be fortified by homework assignments you are expected to do!).

Course Content

YouTube Archive of lab-lectures:

databricks notebooks individually

All databricks notebooks

Import all databricks notebooks for this module as a .dbc file from:

Updated: