SDS-3.x: Scalable Data Science and Distributed Machine Learning

000_1-sds-3-x: Introduction to Scalable Data Science and Distributed Machine Learning.

Topics: Apache Spark, Scala, RDD, map-reduce, Ingest, Extract, Load, Transform and Explore with noSQL in SparkSQL.

1. Introduction: What is Data Science, Data Engineering and the Data Engineering Science Process?

2. Apache Spark and Big Data

3. Map-Reduce, Transformations and Actions with Resilient Distributed datasets

4. Ingest, Extract, Transform, Load and Explore with noSQL

5. Ethics, Explainability and Fairness - An Operational View

Updated: