ScaDaMaLe Course site and book

We will use databricks community edition and later on the databricks project shard granted for this course under the databricks university alliance with cloud computing grants from databricks for waived DBU units and AWS.

Please go here for a relaxed and detailed-enough tour (later):

https://docs.databricks.com/index.html

databricks community edition

First obtain a free Obtain a databricks community edition account at:

https://community.cloud.databricks.com

Let's get an overview of the databricks managed cloud for processing big data with Apache Spark

DBC Essentials: What is Databricks Cloud?

DB workspace, spark, platform

DBC Essentials: Shard, Cluster, Notebook and Dashboard

DB workspace, spark, platform

DBC Essentials: Team, State, Collaboration, Elastic Resources in one picture

DB workspace, spark, platform

You Should All Have databricks community edition account by now! and have successfully logged in to it.

Import Course Content Now!

Two Steps:

Create a folder named scalable-data-science in your Workspace (NO Typos due to hard-coding of paths in the sequel!)

Import the following .dbc archives from the following URL into Workspace/scalable-data-science folder you just created:
- https://github.com/lamastex/scalable-data-science/raw/master/dbcArchives/2021/
- start with the first file for now and import more as needed:
  - https://github.com/lamastex/scalable-data-science/tree/master/dbcArchives/2021/000_1-sds-3-x
  - ...

Cloud-free Computing Environment

(Optional but strongly recommended)

Before we dive into Scala crash course in a notebook, let's take a look at TASK 2 of the first step in the instructions to set up a local and "cloud-free" computing environment, say on your laptop computer here:

TASK 2 at https://lamastex.github.io/scalable-data-science/sds/basics/instructions/prep/.

This can be handy for prototyping quickly and may even be necessary due to sensitivity of data in certain projects that mandate the data to be confined to some on-premise cluster, etc.

NOTE: This can be done as an optional exercise as it heavily depends on your local computing environment and your software skills or willingness to acquire them.

CAVEAT: The docker-compose prepared for your local environment uses Spark 2.x instead of 3.x, but most of the contents here would run in either version of Spark. - Feel free to make PR with latest versions of Spark :)

sds-3.x/ScaDaMaLe

Login to databricks

databricks community edition

DBC Essentials: What is Databricks Cloud?

DBC Essentials: Shard, Cluster, Notebook and Dashboard

Import Course Content Now!

Cloud-free Computing Environment