Brief Overview of a 360-in-525 Minutes Course Set
For more details see Overview of a 360-in-525 Minutes Course Set in Data Sciences, Spring 2018
360-in-525-3: Geospatial Analytics and Big Data
This is a two-full-days workshop (2 hp) on May 3-4 2018. Prerequisites: 360-in-525-1 or Introduction to data Science. The first day will be done by domain experts from Uppsala University’s Department of Social and Economic Geography in order to introduce the basic problems and datasets of the field with hands-on lab tutorials in non-distributed geospatial analytics. The second day will be on distributed geospatial analytics over real datasets that can be scaled to petabytes (syllabus is jointly designed with experts in London’s big data industry). Topics include efficient distributed spatial joins, ingestion and representations of Open Street Maps that are conducive to pregel-style distributed vertex programs, SparkSQL and Spark Machine Learning pipelines with spatiotemporal GPS trajectories of multiple individuals.
360-in-525-1,2,3 should prepare you for Microsoft Research’s urban computing and cross-domain data fusion
Course Content
360-in-525-3: Geospatial Analytics and Big Data on May 3 2018
YouTube Archive of lab-lectures:
- 360-in-525-03 (Day-1/2-LabLec-1/4): https://youtu.be/8nDUneni_U4
- 360-in-525-03 (Day-1/2-LabLec-2/4): https://youtu.be/pGTIirYKNuI
- 360-in-525-03 (Day-1/2-LabLec-3/4): https://youtu.be/xFwh3ClvcF8
- 360-in-525-03 (Day-1/2-LabLec-4/4): https://youtu.be/08Vmb3FhA1U
SCHEDULE:
- 0830-1000: Introduction to Geospatial Analysis by John Östh and Marina Toger from Uppsala University’s Department of Social and Economic Geography, PDF slides and hyper-links
- Fika break 30 minutes - sponsored by Combient AB
- 1030-1200: Introduction to Geographical Information Systems using QGIS, Lab-Lecture by Marina Toger
- Lunch
- 1330-1500: Primer on Linear Algebra, Distributed Linear Algebra and Linear Regression Pipeline
- Fika break 30 minutes - sponsored by Combient AB
- 1530-1700: Random Forests, Gradient-boosted Regression Trees, Power Plant ML Pipeline (ETL, fitting, predicting, tuning, serving)
- Explanation of L1 ans L2 penalization in regression:
360-in-525-3: Geospatial Analytics and Big Data on May 4 2018
YouTube Archive of lab-lectures:
- 360-in-525-03 (Day-2/2-LabLec-1/4): https://youtu.be/aSow7g2AD-A
- 360-in-525-03 (Day-2/2-LabLec-2/4): https://youtu.be/5t0o7iggCq0
- 360-in-525-03 (Day-2/2-LabLec-3/4): https://youtu.be/BAOfhTJ1z7g
- 360-in-525-03 (Day-2/2-LabLec-4/4): https://youtu.be/vN_EwI-q8ek
SCHEDULE
- 0830-1000: Scalable Geospatial Analytics, An Introduction
- Cross-Domain Data Fusion and Knowledge Extraction (~20 minutes Lecture)
- Markov Random Forests, Activity Detection, Intro to Spark’s GraphX/GraphFrames
- Fika break 30 minutes - sponsored by Combient AB
- 1030-1200: Introduction to Pregel Distributed Vertex Programs in Spark’s GraphX and GraphFrames Library
- Lunch
- 1330-1500: Scalable Gepspatial Computing with Magellan: Uber GPS trajectories in SanFrancisco
- Fika break 30 minutes - sponsored by Combient AB
- 1530-1700: Scalable Geospatial Constraint Satisfaction Problems, Distributed Map-Matching and Lumped Markov chain GraphX Representations of Open Street Maps
- A nonparametric formulation of trajectories as Markov chains over lumped state-space representations of OpenStreetMaps as a generic framework of computational/inferential thinking for geospatial data scientists
- Listen to latest geospatial data engineering podcast by Ram Sriharsha in softwareengineeringdaily.
All databricks notebooks
Import all databricks notebooks for this module as a .dbc
file from: