SDS-3.x: Scalable Data Science and Distributed Machine Learning
000_1-sds-3-x: Introduction to Scalable Data Science and Distributed Machine Learning.
Topics: Apache Spark, Scala, RDD, map-reduce, Ingest, Extract, Load, Transform and Explore with noSQL in SparkSQL.
1. Introduction: What is Data Science, Data Engineering and the Data Engineering Science Process?
2. Apache Spark and Big Data
3. Map-Reduce, Transformations and Actions with Resilient Distributed datasets
4. Ingest, Extract, Transform, Load and Explore with noSQL
- Spark SQL Basics
- SparkSQL HW-a ProgGuide
- SparkSQL HW-b ProgGuide
- SparkSQL HW-c ProgGuide
- SparkSQL HW-d ProgGuide
- SparkSQL HW-e ProgGuide
- SparkSQL HW-f ProgGuide
- SparkSQL HW-g ProgGuide
- ETL Diamonds Data
- ETL Power Plant
- Wiki Click streams