SDS-3.x: Scalable Data Science and Distributed Machine Learning
000_2-sds-3-x-ml: Deeper Dive into Distributed Machine Learning
Topics: Distributed Simulation; various un/supervised ML Algorithms; Linear Algebra; Vertex Programming using SparkML, GraphX and piped-RDDs.
1. Creating packages within notebooks
2. Introduction to Distributed Simulation and Machine Learning
3. Unsupervised Learning - Clustering, K-Means of 1 Million Songs
- 012_UnsupervisedClustering_1MSongsKMeans_Intro
- 013_UnsupervisedClustering_1MSongsKMeans_Stage1ETL
- 014_UnsupervisedClustering_1MSongsKMeans_Stage2Explore
- 015_UnsupervisedClustering_1MSongsKMeans_Stage3Model
4. Supervised Learning - Clustering, Decision Trees and Hand-written Digit Recognition
5. Linear Algebra for Distributed Machine Learning
- 017_LAlgIntro
- 018_LinRegIntro
- 019_DistLAlgForLinRegIntro
- 019x_000_dataTypesProgGuide
- 019x_001_LocalVector
- 019x_002_LabeledPoint
- 019x_003_LocalMatrix
- 019x_004_DistributedMatrix
- 019x_005_RowMatrix
- 019x_006_IndexedRowMatrix
- 019x_007_CoordinateMatrix
- 019x_008_BlockMatrix
6. Supervised Learning - Regression and Random Forests
- 020_PowerPlantPipeline_02ModelTuneEvaluate
- 021_recognizeActivityByRandomForest
- 030_PowerPlantPipeline_03ModelTuneEvaluateDeploy
7. Distributed Vertex Programming, ETL and Graph Querying with GraphX and GraphFrames
8. Old Bailey Online - ETL of XML
9. Piped RDDs - Rigorous Bayesian AB Testing on Old Bailey Online Data
- 033_OBO_PipedRDD_RigorousBayesianABTesting
- 033_OBO_xx0_IvanSadikov_PipedRDDhelp
- 033_OBO_xx1_OBOnlineExample
- 033_OBO_xx2_OBOnlineExampleScala