Local Computer

How to work with Apache Spark on or from your local laptop or desktop computer?


We will mostly be using Apace Spark via the free databricks community edition from https://community.cloud.databricks.com/. Just Sign Up if you have not done so already.

To work on the databricks community edition all you need is a laptop computer with a browser and internet connection.

If you do not own a laptop, which is highly recommened for the course, then you may use a desktop computer that you can access (perhaps in a computer lab).


It is also important to be able to run Spark locally on your laptop. So you can ignore this section if you do not own a laptop.

Towards this we will be using docker to set up the local cloud-free environment for the course:

  • Setting up the local cloud-free environment for the course
  1. Step 1: Download and Install Docker by following instructions below:
    • Linux OS
    • Mac OS
    • Windows OS
  2. Step 2: Install JDK
    • Linux OS
    • Mac OS
    • Windows OS
  3. Step 3: Run Docker from pre-built images for scalable data sciences
    • Follow instructions at docker-sds to jump-start from pre-built docker images in order to work on your laptop (recommend 2-4 GB of memory). You may need 8-16 GB of memory for more sophisticated docker compositions (but this is only needed for developers and data engineering scientists).

Optionally after some downloads and setups, you can also: