SDS-2.2, Scalable data science from Atlantis, is a technical course in the area of Big Data, aimed at the needs of Stockholm’s data industry. It is an updated version of SDS-1.6, Scalable Data Science from Middle Earth that was aimed at the needs of New Zealands’s data industry.
The course will introduce Spark’s core concepts via hands-on coding, including resilient distributed datasets and map-reduce algorithms, DataFrame and Spark SQL on Catalyst, scalable machine-learning pipelines in MlLib and vertex programs using the distributed graph processing framework of GraphX. We will solve instances of real-world big data decision problems from various scientific domains.
This is being prepared by Raazesh Sainudiin with assistance from Tilo Wiklund and Dan Strangberg.