SDS-1.6 on databricks

Scalable Data Science from Middle Earth, A Big Data Course in Apache Spark 1.6 over databricks

How to self-learn this content?

The 2016 instance of this scalable-data-science course finished on June 30 2016.

To learn Apache Spark for free try databricks Community edition by starting from https://databricks.com/try-databricks.

All course content can be uploaded for self-paced learning by copying the following URL for 2016/Spark1_6_to_1_3/scalable-data-science.dbc archive and importing it from the URL to your free Databricks Community Edition.

The Gitbook version of this content is https://www.gitbook.com/book/raazesh-sainudiin/scalable-data-science/details.

The browsable git-pages version of the content is http://raazesh-sainudiin.github.io/scalable-data-science/.

How to cite this work?

Scalable Data Science, Raazesh Sainudiin and Sivanand Sivaram, Published by GitBook https://www.gitbook.com/book/raazesh-sainudiin/scalable-data-science/details, 787 pages, 30th June 2016.

Supported By

Databricks Academic Partners Program and Amazon Web Services Educate.

Summary of Contents

Contribute

All course content is currently being pushed by Raazesh Sainudiin after it has been tested in Databricks cloud (mostly under Spark 1.6 and some involving Magellan under Spark 1.5.1).

The markdown version for gitbook is generated from the Databricks .scala, .py and other source codes. The gitbook is not a substitute for the Databricks notebooks available in the Databricks cloud. The following issues need to be resolved:

  • need to find a stable solution for the output of various databricks cells to be shown in gitbook, including those from display_HTML and frameIt with their in-place embeds of web content.

Please feel free to fork the github repository:

Furthermore, due to the anticipation of Spark 2.0 this mostly Spark 1.6 version could be enhanced with a 2.0 version-specific upgrade.

Please send any typos or suggestions to raazesh.sainudiin@gmail.com

Please read a note on babel to understand how the gitbook is generated from the .scala source of the databricks notebook.

Raazesh Sainudiin, Laboratory for Mathematical Statistical Experiments, Christchurch Centre and School of Mathematics and Statistics, University of Canterbury, Private Bag 4800, Christchurch 8041, Aotearoa New Zealand

Sun Jun 19 21:59:19 NZST 2016

Updated: