Local Computer
How to work with Apache Spark on or from your local laptop or desktop computer?
Cloud-full
We will mostly be using Apace Spark via the free databricks community edition from https://community.cloud.databricks.com/. Just Sign Up if you have not done so already.
To work on the databricks community edition all you need is a laptop computer with a browser and internet connection.
If you do not own a laptop, which is highly recommened for the course, then you may use a desktop computer that you can access (perhaps in a computer lab).
Cloud-free
It is also important to be able to run Spark locally on your laptop. So you can ignore this section if you do not own a laptop.
Towards this we will be using docker to set up the local cloud-free environment for the course:
- Setting up the local cloud-free environment for the course
- Step 1: Download and Install Docker by following instructions below:
- Linux OS
- Mac OS
- Windows OS
- Step 2: Install JDK
- Linux OS
- Mac OS
- Windows OS
- Step 3: Run Docker from pre-built images for scalable data sciences
- Follow instructions at docker-sds to jump-start from pre-built docker images in order to work on your laptop (recommend 2-4 GB of memory). You may need 8-16 GB of memory for more sophisticated docker compositions (but this is only needed for developers and data engineering scientists).
Optionally after some downloads and setups, you can also: