000_ScalableGeoSpatialComputing(Scala)

SDS-2.2-360-in-525-03: Geospatial Analytics and Big Data

This needs to be edited for Spark 2.2 and for tight integration with other topics like:

  • geomesa
  • twitter geo-located tweets
  • etc.

Students need to have seen:

  • scala crash course
  • Spark
    • DataFrames
    • rdds
    • graphX
    • ?

This is the 2016 version for Middle Earth for a starting point to shape around....

The html source url of this databricks notebook and its recorded Uji Image of Uji, Dogen's Time-Being:

sds/uji/week2/week10/035_ScalableGeoSpatialComputing

What is Geospatial Analytics?

(watch now 3 minutes and 23 seconds):

Spark Summit East 2016 - What is Geospatial Analytics by Ram Sri Harsha

Some Concrete Examples of Scalable Geospatial Analytics

1. Let us check out cross-domain data fusion in MSR's Urban Computing Group

Here is an excerpt from the MSR Urban Computing link form above:

Urban computing is a process of acquisition, integration, and analysis of big and heterogeneous data generated by a diversity of sources in urban spaces, such as sensors, devices, vehicles, buildings, and human, to tackle the major issues that cities face, e.g. air pollution, increased energy consumption and traffic congestion. Urban computing connects unobtrusive and ubiquitous sensing technologies, advanced data management and analytics models, and novel visualization methods, to create win-win-win solutions that improve urban environment, human life quality, and city operation systems. Urban computing also helps us understand the nature of urban phenomena and even predict the future of cities. A survey paper on urban computing:

Yu Zheng, Licia Capra, Ouri Wolfson, Hai Yang. Urban Computing: concepts, methodologies, and applications. ACM Transaction on Intelligent Systems and Technology (ACM TIST). 2014.

Urban computing is also a research project in Microsoft Research, led by Dr. Yu Zheng since March 2008. By analyzing the big data generated in urban spaces, a series of urban computing applications have been enabled as follows. One of core research problems is to fuse data across different domains. The other is to learn knowledge from spatio-temporal data, e.g. trajectories.

Yu Zheng. Methodologies for Cross-Domain Data Fusion: An Overview. IEEE Transactions on Big Data, vol. 1, no. 1. 2015. (A Tutorial)

Yu Zheng. Trajectory Data Mining: An Overview. ACM Transaction on Intelligent Systems and Technology. 2015, vol. 6, issue 3. (A Tutorial)

1. Several sciences are naturally geospatial

  • forestry,
  • geography,
  • geology,
  • seismology,
  • etc. etc.

See for example the global EQ datastreams from US geological Service below.

A bold idea: Imagine the non-parametric inference problem of estimating co-exciting Hawkes-like processes for modelling earth quakes on the entire planet!

For a global data source, see US geological Service's Earthquake hazards Program "http://earthquake.usgs.gov/data/.

Marina, let's add more stuff here...