031_GeospatialAnalyticsInMagellan(Scala)

Note for Spark 2.4.3

The current (2019-06-14) latest maven coordinates for Magellan are behind github and do not work for Spark 2.3.0 and up. For Spark 2.3.x support you need to clone and build the latest version of Magellan from Github:

However this version doesn't work for Spark 2.4.x (see this pull request). There is a fork of the Magellan repo which does add support for Spark 2.4.3:

Instructions

  1. If you need support for Spark 2.4.3 use git to clone the forked repo. If you need support for Spark 2.3.x clone the official repo.
  2. Inside the cloned directory, run sbt assembly to package a jar with Magellan and all of its dependencies. The jar can be found in the target/scala-2.11 directory inside the repo.
  3. In Databricks choose Create -> Library and upload the packaged jar.
  4. Create a cluster with the uploaded Magellan library installed or if you are already running a cluster and installed the uploaded library to it you have to detach and re-attach any notebook currently using that cluster.

We have already sbt-assembled the right jar file for you at:

NOTE: The magellan library's usual maven coordinates harsha2010:magellan:1.0.6-s_2.11 may be outdated, but it is here for your future reference. You can follow instructions here to assemble the master jar if needed:

What is Geospatial Analytics?

(watch now 3 minutes and 23 seconds: 111-314 seconds):

Spark Summit East 2016 - What is Geospatial Analytics by Ram Sri Harsha

Needs magellan jar assembled from a fork (see below):

Some Concrete Examples of Scalable Geospatial Analytics

Let us check out cross-domain data fusion in MSR's Urban Computing Group

Several sciences are naturally geospatial

  • forestry,
  • geography,
  • geology,
  • seismology,
  • ecology,
  • etc. etc.

See for example the global EQ datastreams from US geological Service below.

For a global data source, see US geological Service's Earthquake hazards Program "http://earthquake.usgs.gov/data/.

REDO

https://magellan.ghost.io/how-does-magellan-scale-geospatial-queries/

Introduction to Magellan for Scalable Geospatial Analytics

This is a minor augmentation of Ram Harsha's Magellan code blogged here:

Show code

Do we need one more geospatial analytics library?

From Ram's slide 4 of this Spark Summit East 2016 talk at slideshare:

  • Spatial Analytics at scale is challenging
    • Simplicity + Scalability = Hard
  • Ancient Data Formats
    • metadata, indexing not handled well, inefficient storage
  • Geospatial Analytics is not simply Business Intelligence anymore
    • Statistical + Machine Learning being leveraged in geospatial
  • Now is the time to do it!
    • Explosion of mobile data
    • Finer granularity of data collection for geometries
    • Analytics stretching the limits of traditional approaches
    • Spark SQL + Catalyst + Tungsten makes extensible SQL engines easier than ever before!

Nuts and Bolts of Magellan

This is an expansion oof of the following databricks notebook:

and look at the magellan README in github:

HOMEWORK: Watch the magellan presentation by Ram Harsha (Hortonworks) in Spark Summit East 2016.

Other resources for magellan:

Let's get our hands dirty with basics in magellan.

Spatial Data Structures

  • Points
  • Polygons
  • lines
  • Polylines

Users' View of Spatial Data Structures (details are typically "invisible" to user)

Predicates

  • within
  • intersects
// create a points DataFrame
val points = sc.parallelize(Seq((-1.0, -1.0), (-1.0, 1.0), (1.0, -1.0))).toDF("x", "y")
points: org.apache.spark.sql.DataFrame = [x: double, y: double]
// transform (lat,lon) into Point using custom user-defined function
import magellan.Point // just Point
import org.apache.spark.sql.functions.udf
val toPointUDF = udf{(x:Double,y:Double) => Point(x,y) }
import magellan.Point import org.apache.spark.sql.functions.udf toPointUDF: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function2>,org.apache.spark.sql.types.PointUDT@6750528c,Some(List(DoubleType, DoubleType)))
// let's show the results of the DF with a new column called point
points.withColumn("point", toPointUDF($"x", $"y")).show()
+----+----+-----------------+ | x| y| point| +----+----+-----------------+ |-1.0|-1.0|Point(-1.0, -1.0)| |-1.0| 1.0| Point(-1.0, 1.0)| | 1.0|-1.0| Point(1.0, -1.0)| +----+----+-----------------+
// let's show the results of the DF with a new column called point
// slicker with ' instead of $"" as follows for column names
points.withColumn("point", toPointUDF('x, 'y)).show()
+----+----+-----------------+ | x| y| point| +----+----+-----------------+ |-1.0|-1.0|Point(-1.0, -1.0)| |-1.0| 1.0| Point(-1.0, 1.0)| | 1.0|-1.0| Point(1.0, -1.0)| +----+----+-----------------+
points.show
+----+----+ | x| y| +----+----+ |-1.0|-1.0| |-1.0| 1.0| | 1.0|-1.0| +----+----+
// Let's instead use the built-in expression to do the same - it's much faster on larger DataFrames due to code-gen
import org.apache.spark.sql.magellan.dsl.expressions._
val points = sc.parallelize(Seq((-1.0, -1.0), (-1.0, 1.0), (1.0, -1.0))).toDF("x", "y").select(point($"x", $"y").as("point"))

points.show()
+-----------------+ | point| +-----------------+ |Point(-1.0, -1.0)| | Point(-1.0, 1.0)| | Point(1.0, -1.0)| +-----------------+ import org.apache.spark.sql.magellan.dsl.expressions._ points: org.apache.spark.sql.DataFrame = [point: point]
display(points) // busted in bleeding-edge magellan we need for computing
Point(-1.0, -1.0)
Point(-1.0, 1.0)
Point(1.0, -1.0)
points.show()
+-----------------+ | point| +-----------------+ |Point(-1.0, -1.0)| | Point(-1.0, 1.0)| | Point(1.0, -1.0)| +-----------------+

The latest version of magellan seems to have issues with the databricks display function. We will ignore this convenience of display and continue with our analysis.

This is a databricks display of magellan points when it is working properly in Spark 2.2.

Let's verify empirically if it is indeed faster for larger DataFrames.

// to generate a sequence of pairs of random numbers we can do:
import util.Random.nextDouble
Seq.fill(10)((-1.0*nextDouble,+1.0*nextDouble))
import util.Random.nextDouble res8: Seq[(Double, Double)] = List((-0.11710147373398083,0.8983950860024985), (-0.7919110802439387,0.24630710453027382), (-0.8686384637905502,0.6107993919836924), (-0.7720477392665376,0.26927119749728146), (-0.7940086039135509,0.19383519165041008), (-0.944232150749384,0.27572597473951754), (-0.4345417828842526,0.6401409444751863), (-0.6338182823007251,0.5253875378747945), (-0.8687089939542223,0.8434190703571257), (-0.593313717103104,0.28098640323918034))
// using the UDF method with 1 million points we can do a count action of the DF with point column
// don't add too many zeros as it may crash your driver program
sc.parallelize(Seq.fill(10000000)((-1.0*nextDouble,+1.0*nextDouble)))
  .toDF("x", "y")
  .withColumn("point", toPointUDF('x, 'y))
  .count()
res11: Long = 10000000
// it should be twice as fast with code-gen especially when we are ingesting from dbfs as opposed to 
// using Seq.fill in the driver...
sc.parallelize(Seq.fill(10000000)((-1.0*nextDouble,+1.0*nextDouble)))
  .toDF("x", "y")
  .withColumn("point", point('x, 'y))
  .count()
res10: Long = 10000000
Show code
// Create a Polygon DataFrame
import magellan.Polygon

case class PolygonExample(polygon: Polygon)

// do this in your head / pencil-paper / black-board going counter-clockwise
val ring = Array(Point(1.0, 1.0), Point(1.0, -1.0), Point(-1.0, -1.0), Point(-1.0, 1.0), Point(1.0, 1.0))
val polygon = Polygon(Array(0), ring)

val polygons = sc.parallelize(Seq(
  PolygonExample(Polygon(Array(0), ring))
  //Polygon(Array(0), ring)
)).toDF()

import magellan.Polygon defined class PolygonExample ring: Array[magellan.Point] = Array(Point(1.0, 1.0), Point(1.0, -1.0), Point(-1.0, -1.0), Point(-1.0, 1.0), Point(1.0, 1.0)) polygon: magellan.Polygon = magellan.Polygon@fc5436e9 polygons: org.apache.spark.sql.DataFrame = [polygon: polygon]
polygons.show(false)
+------------------------+ |polygon | +------------------------+ |magellan.Polygon@89f47b6| +------------------------+
display(polygons) // not much can be seen as its in the object
magellan.Polygon@bff1092b

This is a databricks display of magellan polygon when it is working properly in Spark 2.2 on another databricks run-time.

Predicates

import org.apache.spark.sql.types._

import org.apache.spark.sql.types._
// join points with polygons upon intersection
points.join(polygons)
      .where($"point" intersects $"polygon")
      .count()
res16: Long = 3
points.show()
+-----------------+ | point| +-----------------+ |Point(-1.0, -1.0)| | Point(-1.0, 1.0)| | Point(1.0, -1.0)| +-----------------+

Pop Quiz:

What are the three points intersect the polygon?

More generally we can have more complex queries as the generic polygon need not even be a convex set.

Show code

This is not an uncommon polygon - think of shapes of parks or lakes on a map.

A bounding box for a non-covex polygon

Let us consider our simple points and polygons we just made and consider the following points within polygon join query.

// join points with polygons upon within or containment
points.join(polygons)
      .where($"point" within $"polygon")
      .count()
res18: Long = 0

Line

//creating line from two points
import magellan.Line

case class LineExample(line: Line)

val line = Line(Point(1.0, 1.0), Point(1.0, -1.0))

val lines = sc.parallelize(Seq(
      LineExample(line)
    )).toDF()

lines.show(false)
+---------------------------------------+ |line | +---------------------------------------+ |Line(Point(1.0, 1.0), Point(1.0, -1.0))| +---------------------------------------+ import magellan.Line defined class LineExample line: magellan.Line = Line(Point(1.0, 1.0), Point(1.0, -1.0)) lines: org.apache.spark.sql.DataFrame = [line: line]
display(lines)
Line(Point(1.0, 1.0), Point(1.0, -1.0))

This is a databricks display of magellan lines when it is working properly!

PolyLine

// creating polyline
import magellan.PolyLine

case class PolyLineExample(polyline: PolyLine)

val ring = Array(Point(1.0, 1.0), Point(1.0, -1.0),
      Point(-1.0, -1.0), Point(-1.0, 1.0))

val polylines = sc.parallelize(Seq(
      PolyLineExample(PolyLine(Array(0), ring))
    )).toDF()
import magellan.PolyLine defined class PolyLineExample ring: Array[magellan.Point] = Array(Point(1.0, 1.0), Point(1.0, -1.0), Point(-1.0, -1.0), Point(-1.0, 1.0)) polylines: org.apache.spark.sql.DataFrame = [polyline: polyline]
polylines.show(false)
+--------------------------+ |polyline | +--------------------------+ |magellan.PolyLine@6b86ee8d| +--------------------------+

This is a databricks display of magellan polyline when it is working properly!

// now let's make a polyline with two or more lines out of the same ring
val polylines2 = sc.parallelize(Seq(
  PolyLineExample(PolyLine(Array(0,2), ring)) // first line starts at index 0 and second one starts at index 2
)).toDF()

polylines2.show(false)
+--------------------------+ |polyline | +--------------------------+ |magellan.PolyLine@7c8b9197| +--------------------------+ polylines2: org.apache.spark.sql.DataFrame = [polyline: polyline]

You can do a bit with magellan and esri under its hood

import magellan.Point

val p = Point(1.0, -1.0)
import magellan.Point p: magellan.Point = Point(1.0, -1.0)
//p. // uncomment line and put the cursor next to the . and hit TAB to see available methods on the magellan Point p
(p.getX, p.getY) // for example we can getX and getY values of the Point p
res26: (Double, Double) = (1.0,-1.0)
val pc = Point(0.0,0.0)
p.withinCircle(pc, 5.0) // check if Point p iswith circle of radius 5.0 around Point pc
pc: magellan.Point = Point(0.0, 0.0) res27: Boolean = true
p.boundingBox // find the bounding box of p
res28: magellan.BoundingBox = BoundingBox(1.0,-1.0,1.0,-1.0)
import magellan.Point

// create a radius 0.5 buffered polygon about the centre given by Point(0.0, 1.0)
val aBufferedPolygon = Point(0.0, 1.0).buffer(0.5) 

// this used to fail, now it should work
magellan.esri.ESRIUtil.toESRIGeometry(aBufferedPolygon)

println(aBufferedPolygon)
magellan.Polygon@1a9ba3c5 import magellan.Point aBufferedPolygon: magellan.Polygon = magellan.Polygon@1a9ba3c5

Dive here for more on magellan Point:

Knock yourself out on other Data Structures in the source.

Uber Trajectories in San Francisco

Dataset for the Demo done by Ram Sri Harsha in Europe Spark Summit 2015

First the datasets have to be loaded into distributed file store.

  • See Step 0: Downloading datasets and loading into dbfs below for doing this anew (This only needs to be done once if the data is persisted in the distributed file system).

After downloading the data, we expect to have the following files in distributed file system (dbfs):

  • all.tsv is the file of all uber trajectories
  • SFNbhd is the directory containing SF neighborhood shape files.
// display the contents of the dbfs directory "dbfs:/datasets/magellan/"
// - if you don't see files here then go to Step 0 below as explained above!
display(dbutils.fs.ls("dbfs:/datasets/magellan/")) 
dbfs:/datasets/magellan/SFNbhd/SFNbhd/0
dbfs:/datasets/magellan/all.tsvall.tsv60947802
%sh
ls /dbfs/datasets
20180416121500.gkg.csv 20190517121500.gkg.csv 20190523121500.gkg.csv beijing books graphhopper instacart_2017_05_01 magellan maps MEP mini_newsgroups obo osm sds social-media-usage.csv sou streamingFiles streamingFilesNormalMixture taxis t-drive-trips tweetsStreamTmp wiki-clickstream wiki-clickstream-curr_id wiki-clickstream-prev_titled wikipedia-datasets

First five lines or rows of the uber data containing: tripID, timestamp, Lon, Lat

sc.textFile("dbfs:/datasets/magellan/all.tsv").take(5).foreach(println)
00001 2007-01-07T10:54:50+00:00 37.782551 -122.445368 00001 2007-01-07T10:54:54+00:00 37.782745 -122.444586 00001 2007-01-07T10:54:58+00:00 37.782842 -122.443688 00001 2007-01-07T10:55:02+00:00 37.782919 -122.442815 00001 2007-01-07T10:55:06+00:00 37.782992 -122.442112

The neighborhood shape files for Sanfrancisco will form the polygons of interest to us.

Show code
display(dbutils.fs.ls("dbfs:/datasets/magellan/SFNbhd")) // legacy shape files - used in various sectors
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.dbfplanning_neighborhoods.dbf1028
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.prjplanning_neighborhoods.prj567
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.sbnplanning_neighborhoods.sbn516
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.sbxplanning_neighborhoods.sbx164
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.shpplanning_neighborhoods.shp214576
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.shp.xmlplanning_neighborhoods.shp.xml21958
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.shxplanning_neighborhoods.shx396

Homework

First watch the more technical magellan presentation by Ram Sri Harsha (Hortonworks) in Spark Summit Europe 2015

Ram Sri Harsha's Magellan Spark Summit EU 2015 Talk]

Let's repeat Ram's original analysis from the following blog as done below.

Ram's blog in HortonWorks.

This is just to get you started... You may need to moidfy this!

case class UberRecord(tripId: String, timestamp: String, point: Point) // a case class for UberRecord 
defined class UberRecord
val uber = sc.textFile("dbfs:/datasets/magellan/all.tsv")
              .map { line =>
                      val parts = line.split("\t" )
                      val tripId = parts(0)
                      val timestamp = parts(1)
                      val point = Point(parts(3).toDouble, parts(2).toDouble)
                      UberRecord(tripId, timestamp, point)
                    }
                     //.repartition(100) // using default repartition
                     .toDF()
                     .cache()
uber: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [tripId: string, timestamp: string ... 1 more field]
val uberRecordCount = uber.count() // how many Uber records?
uberRecordCount: Long = 1128663
val uberRecordCount = uber.count() // time for cched count
uberRecordCount: Long = 1128663

So there are over a million UberRecords.

sqlContext.read.format("magellan").load("dbfs:/datasets/magellan/SFNbhd/").printSchema()
root |-- point: point (nullable = true) |-- polyline: polyline (nullable = true) |-- polygon: polygon (nullable = true) |-- metadata: map (nullable = true) | |-- key: string | |-- value: string (valueContainsNull = true) |-- valid: boolean (nullable = true)
val neighborhoods = sqlContext.read.format("magellan") 
                                   .load("dbfs:/datasets/magellan/SFNbhd/")
                                   .select($"polygon", $"metadata")
                                   .cache()
neighborhoods: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [polygon: polygon, metadata: map<string,string>]
neighborhoods.count() // how many neighbourhoods in SF?
res39: Long = 37
neighborhoods.printSchema
root |-- polygon: polygon (nullable = true) |-- metadata: map (nullable = true) | |-- key: string | |-- value: string (valueContainsNull = true)
neighborhoods.show(2,false) // see the first two neighbourhoods
+-------------------------+-----------------------------------------+ |polygon |metadata | +-------------------------+-----------------------------------------+ |magellan.Polygon@6958b819|[neighborho -> Twin Peaks ]| |magellan.Polygon@eb43150 |[neighborho -> Pacific Heights ]| +-------------------------+-----------------------------------------+ only showing top 2 rows

You Try:

Modify the next cell to see all 37 neighborhoods.

neighborhoods.show(37,false) // modify this cell to see all 37 neighborhoods
+-------------------------+-----------------------------------------+ |polygon |metadata | +-------------------------+-----------------------------------------+ |magellan.Polygon@79d7786f|[neighborho -> Twin Peaks ]| |magellan.Polygon@f8df1df3|[neighborho -> Pacific Heights ]| |magellan.Polygon@a65ca66b|[neighborho -> Visitacion Valley ]| |magellan.Polygon@525237d |[neighborho -> Potrero Hill ]| |magellan.Polygon@ff7a7064|[neighborho -> Crocker Amazon ]| |magellan.Polygon@5f8b55b9|[neighborho -> Outer Mission ]| |magellan.Polygon@e1194bee|[neighborho -> Bayview ]| |magellan.Polygon@3a1e0c23|[neighborho -> Lakeshore ]| |magellan.Polygon@d4f27423|[neighborho -> Russian Hill ]| |magellan.Polygon@1c18e19f|[neighborho -> Golden Gate Park ]| |magellan.Polygon@753cfca1|[neighborho -> Outer Sunset ]| |magellan.Polygon@3e8a1c1b|[neighborho -> Inner Sunset ]| |magellan.Polygon@3a6d8f69|[neighborho -> Excelsior ]| |magellan.Polygon@469ee93c|[neighborho -> Outer Richmond ]| |magellan.Polygon@b44a048f|[neighborho -> Parkside ]| |magellan.Polygon@1a1b329 |[neighborho -> Bernal Heights ]| |magellan.Polygon@7e437a2b|[neighborho -> Noe Valley ]| |magellan.Polygon@8f65715e|[neighborho -> Presidio ]| |magellan.Polygon@41ae0ee6|[neighborho -> Nob Hill ]| |magellan.Polygon@e6010e5d|[neighborho -> Financial District ]| |magellan.Polygon@5427f397|[neighborho -> Glen Park ]| |magellan.Polygon@19e6e9a0|[neighborho -> Marina ]| |magellan.Polygon@113ab5cc|[neighborho -> Seacliff ]| |magellan.Polygon@7233c635|[neighborho -> Mission ]| |magellan.Polygon@d97ad435|[neighborho -> Downtown/Civic Center ]| |magellan.Polygon@2f52b87a|[neighborho -> South of Market ]| |magellan.Polygon@24ed71c8|[neighborho -> Presidio Heights ]| |magellan.Polygon@856c01f |[neighborho -> Inner Richmond ]| |magellan.Polygon@eb4234af|[neighborho -> Castro/Upper Market ]| |magellan.Polygon@390c92c5|[neighborho -> West of Twin Peaks ]| |magellan.Polygon@f2605fa0|[neighborho -> Ocean View ]| |magellan.Polygon@41c54375|[neighborho -> Treasure Island/YBI ]| |magellan.Polygon@e5112d2e|[neighborho -> Chinatown ]| |magellan.Polygon@a6f4c23c|[neighborho -> Western Addition ]| |magellan.Polygon@b5d93a72|[neighborho -> North Beach ]| |magellan.Polygon@fc6df3ea|[neighborho -> Diamond Heights ]| |magellan.Polygon@70ae060b|[neighborho -> Haight Ashbury ]| +-------------------------+-----------------------------------------+
import org.apache.spark.sql.functions._ // this is needed for sql functions like explode, etc.
import org.apache.spark.sql.functions._
//names of all 37 neighborhoods of San Francisco
neighborhoods.select(explode($"metadata").as(Seq("k", "v"))).show(37,false)
+----------+-------------------------+ |k |v | +----------+-------------------------+ |neighborho|Twin Peaks | |neighborho|Pacific Heights | |neighborho|Visitacion Valley | |neighborho|Potrero Hill | |neighborho|Crocker Amazon | |neighborho|Outer Mission | |neighborho|Bayview | |neighborho|Lakeshore | |neighborho|Russian Hill | |neighborho|Golden Gate Park | |neighborho|Outer Sunset | |neighborho|Inner Sunset | |neighborho|Excelsior | |neighborho|Outer Richmond | |neighborho|Parkside | |neighborho|Bernal Heights | |neighborho|Noe Valley | |neighborho|Presidio | |neighborho|Nob Hill | |neighborho|Financial District | |neighborho|Glen Park | |neighborho|Marina | |neighborho|Seacliff | |neighborho|Mission | |neighborho|Downtown/Civic Center | |neighborho|South of Market | |neighborho|Presidio Heights | |neighborho|Inner Richmond | |neighborho|Castro/Upper Market | |neighborho|West of Twin Peaks | |neighborho|Ocean View | |neighborho|Treasure Island/YBI | |neighborho|Chinatown | |neighborho|Western Addition | |neighborho|North Beach | |neighborho|Diamond Heights | |neighborho|Haight Ashbury | +----------+-------------------------+

This join below yields nothing.

So what's going on?

Watch Ram's 2015 Spark Summit talk for details on geospatial formats and transformations.

neighborhoods
  .join(uber)
  .where($"point" within $"polygon")
  .select($"tripId", $"timestamp", explode($"metadata").as(Seq("k", "v")))
  .withColumnRenamed("v", "neighborhood")
  .drop("k")
  .show(5)

+------+---------+------------+ |tripId|timestamp|neighborhood| +------+---------+------------+ +------+---------+------------+
Show code

Need the right transformer to transform the points into the right coordinate system of the shape files.

Show code
// This code was removed from magellan in this commit:
// https://github.com/harsha2010/magellan/commit/8df0a62560116f8ed787fc7e86f190f8e2730826
// We bring this back to show how to roll our own transformations.
// EXERCISE: find existing transformers / methods in magellan or esri to go between coordinate systems 
import magellan.Point

class NAD83(params: Map[String, Any]) {
  val RAD = 180d / Math.PI
  val ER  = 6378137.toDouble  // semi-major axis for GRS-80
  val RF  = 298.257222101  // reciprocal flattening for GRS-80
  val F   = 1.toDouble / RF  // flattening for GRS-80
  val ESQ = F + F - (F * F)
  val E   = StrictMath.sqrt(ESQ)

  private val ZONES =  Map(
    401 -> Array(122.toDouble, 2000000.0001016,
      500000.0001016001, 40.0,
      41.66666666666667, 39.33333333333333),
    403 -> Array(120.5, 2000000.0001016,
      500000.0001016001, 37.06666666666667,
      38.43333333333333, 36.5)
  )

  def from() = {
    val zone = params("zone").asInstanceOf[Int]
    ZONES.get(zone) match {
      case Some(x) => if (x.length == 5) {
        toTransverseMercator(x)
      } else {
        toLambertConic(x)
      }
      case None => ???
    }
  }

  def to() = {
    val zone = params("zone").asInstanceOf[Int]
    ZONES.get(zone) match {
      case Some(x) => if (x.length == 5) {
        fromTransverseMercator(x)
      } else {
        fromLambertConic(x)
      }
      case None => ???
    }
  }

  def qqq(e: Double, s: Double) = {
    (StrictMath.log((1 + s) / (1 - s)) - e *
      StrictMath.log((1 + e * s) / (1 - e * s))) / 2
  }

  def toLambertConic(params: Array[Double]) = {
    val cm = params(0) / RAD  // CENTRAL MERIDIAN (CM)
    val eo = params(1)  // FALSE EASTING VALUE AT THE CM (METERS)
    val nb = params(2)  // FALSE NORTHING VALUE AT SOUTHERMOST PARALLEL (METERS), (USUALLY ZERO)
    val fis = params(3) / RAD  // LATITUDE OF SO. STD. PARALLEL
    val fin = params(4) / RAD  // LATITUDE OF NO. STD. PARALLEL
    val fib = params(5) / RAD // LATITUDE OF SOUTHERNMOST PARALLEL
    val sinfs = StrictMath.sin(fis)
    val cosfs = StrictMath.cos(fis)
    val sinfn = StrictMath.sin(fin)
    val cosfn = StrictMath.cos(fin)
    val sinfb = StrictMath.sin(fib)
    val qs = qqq(E, sinfs)
    val qn = qqq(E, sinfn)
    val qb = qqq(E, sinfb)
    val w1 = StrictMath.sqrt(1.toDouble - ESQ * sinfs * sinfs)
    val w2 = StrictMath.sqrt(1.toDouble - ESQ * sinfn * sinfn)
    val sinfo = StrictMath.log(w2 * cosfs / (w1 * cosfn)) / (qn - qs)
    val k = ER * cosfs * StrictMath.exp(qs * sinfo) / (w1 * sinfo)
    val rb = k / StrictMath.exp(qb * sinfo)

    (point: Point) => {
      val (long, lat) = (point.getX(), point.getY())
      val l = - long / RAD
      val f = lat / RAD
      val q = qqq(E, StrictMath.sin(f))
      val r = k / StrictMath.exp(q * sinfo)
      val gam = (cm - l) * sinfo
      val n = rb + nb - (r * StrictMath.cos(gam))
      val e = eo + (r * StrictMath.sin(gam))
      Point(e, n)
    }
  }

  def toTransverseMercator(params: Array[Double]) = {
    (point: Point) => {
      point
    }
  }

  def fromLambertConic(params: Array[Double]) = {
    val cm = params(0) / RAD  // CENTRAL MERIDIAN (CM)
    val eo = params(1)  // FALSE EASTING VALUE AT THE CM (METERS)
    val nb = params(2)  // FALSE NORTHING VALUE AT SOUTHERMOST PARALLEL (METERS), (USUALLY ZERO)
    val fis = params(3) / RAD  // LATITUDE OF SO. STD. PARALLEL
    val fin = params(4) / RAD  // LATITUDE OF NO. STD. PARALLEL
    val fib = params(5) / RAD // LATITUDE OF SOUTHERNMOST PARALLEL
    val sinfs = StrictMath.sin(fis)
    val cosfs = StrictMath.cos(fis)
    val sinfn = StrictMath.sin(fin)
    val cosfn = StrictMath.cos(fin)
    val sinfb = StrictMath.sin(fib)

    val qs = qqq(E, sinfs)
    val qn = qqq(E, sinfn)
    val qb = qqq(E, sinfb)
    val w1 = StrictMath.sqrt(1.toDouble - ESQ * sinfs * sinfs)
    val w2 = StrictMath.sqrt(1.toDouble - ESQ * sinfn * sinfn)
    val sinfo = StrictMath.log(w2 * cosfs / (w1 * cosfn)) / (qn - qs)
    val k = ER * cosfs * StrictMath.exp(qs * sinfo) / (w1 * sinfo)
    val rb = k / StrictMath.exp(qb * sinfo)
    (point: Point) => {
      val easting = point.getX()
      val northing = point.getY()
      val npr = rb - northing + nb
      val epr = easting - eo
      val gam = StrictMath.atan(epr / npr)
      val lon = cm - (gam / sinfo)
      val rpt = StrictMath.sqrt(npr * npr + epr * epr)
      val q = StrictMath.log(k / rpt) / sinfo
      val temp = StrictMath.exp(q + q)
      var sine = (temp - 1.toDouble) / (temp + 1.toDouble)
      var f1, f2 = 0.0
      for (i <- 0 until 2) {
        f1 = ((StrictMath.log((1.toDouble + sine) / (1.toDouble - sine)) - E *
          StrictMath.log((1.toDouble + E * sine) / (1.toDouble - E * sine))) / 2.toDouble) - q
        f2 = 1.toDouble / (1.toDouble - sine * sine) - ESQ / (1.toDouble - ESQ * sine * sine)
        sine -= (f1/ f2)
      }
      Point(StrictMath.toDegrees(lon) * -1, StrictMath.toDegrees(StrictMath.asin(sine)))
    }
  }

  def fromTransverseMercator(params: Array[Double]) = {
    val cm = params(0)  // CENTRAL MERIDIAN (CM)
    val fe = params(1)  // FALSE EASTING VALUE AT THE CM (METERS)
    val or = params(2) / RAD  // origin latitude
    val sf = 1.0 - (1.0 / params(3)) // scale factor
    val fn = params(4)  // false northing
    // translated from TCONPC subroutine
    val eps = ESQ / (1.0 - ESQ)
    val pr = (1.0 - F) * ER
    val en = (ER - pr) / (ER + pr)
    val en2 = en * en
    val en3 = en * en * en
    val en4 = en2 * en2

    var c2 = -3.0 * en / 2.0 + 9.0 * en3 / 16.0
    var c4 = 15.0d * en2 / 16.0d - 15.0d * en4 /32.0
    var c6 = -35.0 * en3 / 48.0
    var c8 = 315.0 * en4 / 512.0
    val u0 = 2.0 * (c2 - 2.0 * c4 + 3.0 * c6 - 4.0 * c8)
    val u2 = 8.0 * (c4 - 4.0 * c6 + 10.0 * c8)
    val u4 = 32.0 * (c6 - 6.0 * c8)
    val u6 = 129.0 * c8

    c2 = 3.0 * en / 2.0 - 27.0 * en3 / 32.0
    c4 = 21.0 * en2 / 16.0 - 55.0 * en4 / 32.0d
    c6 = 151.0 * en3 / 96.0
    c8 = 1097.0d * en4 / 512.0
    val v0 = 2.0 * (c2 - 2.0 * c4 + 3.0 * c6 - 4.0 * c8)
    val v2 = 8.0 * (c4 - 4.0 * c6 + 10.0 * c8)
    val v4 = 32.0 * (c6 - 6.0 * c8)
    val v6 = 128.0 * c8

    val r = ER * (1.0 - en) * (1.0 - en * en) * (1.0 + 2.25 * en * en + (225.0 / 64.0) * en4)
    val cosor = StrictMath.cos(or)
    val omo = or + StrictMath.sin(or) * cosor *
      (u0 + u2 * cosor * cosor + u4 * StrictMath.pow(cosor, 4) + u6 * StrictMath.pow(cosor, 6))
    val so = sf * r * omo

    (point: Point) => {
      val easting = point.getX()
      val northing = point.getY()
      // translated from TMGEOD subroutine
      val om = (northing - fn + so) / (r * sf)
      val cosom = StrictMath.cos(om)
      val foot = om + StrictMath.sin(om) * cosom *
        (v0 + v2 * cosom * cosom + v4 * StrictMath.pow(cosom, 4) + v6 * StrictMath.pow(cosom, 6))
      val sinf = StrictMath.sin(foot)
      val cosf = StrictMath.cos(foot)
      val tn = sinf / cosf
      val ts = tn * tn
      val ets = eps * cosf * cosf
      val rn = ER * sf / StrictMath.sqrt(1.0 - ESQ * sinf * sinf)
      val q = (easting - fe) / rn
      val qs = q * q
      val b2 = -tn * (1.0 + ets) / 2.0
      val b4 = -(5.0 + 3.0 * ts + ets * (1.0 - 9.0 * ts) - 4.0 * ets * ets) / 12.0
      val b6 = (61.0 + 45.0 * ts * (2.0 + ts) + ets * (46.0 - 252.0 * ts -60.0 * ts * ts)) / 360.0
      val b1 = 1.0
      val b3 = -(1.0 + ts + ts + ets) / 6.0
      val b5 = (5.0 + ts * (28.0 + 24.0 * ts) + ets * (6.0 + 8.0 * ts)) / 120.0
      val b7 = -(61.0 + 662.0 * ts + 1320.0 * ts * ts + 720.0 * StrictMath.pow(ts, 3)) / 5040.0
      val lat = foot + b2 * qs * (1.0 + qs * (b4 + b6 * qs))
      val l = b1 * q * (1.0 + qs * (b3 + qs * (b5 + b7 * qs)))
      val lon = -l / cosf + cm
      Point(StrictMath.toDegrees(lon) * -1, StrictMath.toDegrees(lat))
    }
  }
}
import magellan.Point defined class NAD83
val transformer: Point => Point = (point: Point) => {
  val from = new NAD83(Map("zone" -> 403)).from()
  val p = point.transform(from)
  Point(3.28084 * p.getX, 3.28084 * p.getY)
}

// add a new column in nad83 coordinates
val uberTransformed = uber
                      .withColumn("nad83", $"point".transform(transformer))
                      .cache()
transformer: magellan.Point => magellan.Point = <function1> uberTransformed: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [tripId: string, timestamp: string ... 2 more fields]
uberTransformed.count()
res48: Long = 1128663
uberTransformed.show(5,false) // nad83 transformed points
+------+-------------------------+-----------------------------+---------------------------------------------+ |tripId|timestamp |point |nad83 | +------+-------------------------+-----------------------------+---------------------------------------------+ |00001 |2007-01-07T10:54:50+00:00|Point(-122.445368, 37.782551)|Point(5999523.477715266, 2113253.7290443885) | |00001 |2007-01-07T10:54:54+00:00|Point(-122.444586, 37.782745)|Point(5999750.8888492435, 2113319.6570987953)| |00001 |2007-01-07T10:54:58+00:00|Point(-122.443688, 37.782842)|Point(6000011.08106823, 2113349.5785887106) | |00001 |2007-01-07T10:55:02+00:00|Point(-122.442815, 37.782919)|Point(6000263.898268142, 2113372.3716762937) | |00001 |2007-01-07T10:55:06+00:00|Point(-122.442112, 37.782992)|Point(6000467.566895697, 2113394.7303657546) | +------+-------------------------+-----------------------------+---------------------------------------------+ only showing top 5 rows
uberTransformed.select("tripId").distinct().count() // number of unique tripIds
res50: Long = 24999

Let' try the join again after appropriate transformation of coordinate system.

val joined = neighborhoods
              .join(uberTransformed)
              .where($"nad83" within $"polygon")
              .select($"tripId", $"timestamp", explode($"metadata").as(Seq("k", "v")))
              .withColumnRenamed("v", "neighborhood")
              .drop("k")
              .cache()
joined: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [tripId: string, timestamp: string ... 1 more field]
val UberRecordsInNbhdsCount = joined.count() // about 131 seconds for first action (doing broadcast hash join)
UberRecordsInNbhdsCount: Long = 1085087
joined.explain
== Physical Plan == InMemoryTableScan [tripId#18198, timestamp#18199, neighborhood#18583] +- InMemoryRelation [tripId#18198, timestamp#18199, neighborhood#18583], StorageLevel(disk, memory, deserialized, 1 replicas) +- *(1) Project [tripId#18198, timestamp#18199, v#18578 AS neighborhood#18583] +- *(1) Generate explode(metadata#18269), [tripId#18198, timestamp#18199], false, [k#18577, v#18578] +- *(1) Project [metadata#18269, tripId#18198, timestamp#18199] +- *(1) BroadcastNestedLoopJoin BuildLeft, Inner, Within(nad83#18439, polygon#18268) :- BroadcastExchange IdentityBroadcastMode : +- InMemoryTableScan [polygon#18268, metadata#18269] : +- InMemoryRelation [polygon#18268, metadata#18269], StorageLevel(disk, memory, deserialized, 1 replicas) : +- *(1) Scan ShapeFileRelation(dbfs:/datasets/magellan/SFNbhd/,Map(path -> dbfs:/datasets/magellan/SFNbhd/)) [polygon#18268,metadata#18269] PushedFilters: [], ReadSchema: struct<polygon:struct<type:int,xmin:double,ymin:double,xmax:double,ymax:double,indices:array<int>... +- InMemoryTableScan [tripId#18198, timestamp#18199, nad83#18439] +- InMemoryRelation [tripId#18198, timestamp#18199, point#18200, nad83#18439], StorageLevel(disk, memory, deserialized, 1 replicas) +- *(1) Project [tripId#18198, timestamp#18199, point#18200, transformer(point#18200, <function1>) AS nad83#18439] +- InMemoryTableScan [point#18200, timestamp#18199, tripId#18198] +- InMemoryRelation [tripId#18198, timestamp#18199, point#18200], StorageLevel(disk, memory, deserialized, 1 replicas) +- *(1) SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, assertnotnull(input[0, linef50280319d444903999f4bda88213311128.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$UberRecord, true]).tripId, true, false) AS tripId#18198, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, assertnotnull(input[0, linef50280319d444903999f4bda88213311128.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$UberRecord, true]).timestamp, true, false) AS timestamp#18199, newInstance(class org.apache.spark.sql.types.PointUDT).serialize AS point#18200] +- Scan[obj#18197]
joined.show(5,false)
+------+-------------------------+-------------------------+ |tripId|timestamp |neighborhood | +------+-------------------------+-------------------------+ |00001 |2007-01-07T10:54:50+00:00|Western Addition | |00001 |2007-01-07T10:54:54+00:00|Western Addition | |00001 |2007-01-07T10:54:58+00:00|Western Addition | |00001 |2007-01-07T10:55:02+00:00|Western Addition | |00001 |2007-01-07T10:55:06+00:00|Western Addition | +------+-------------------------+-------------------------+ only showing top 5 rows
uberRecordCount - UberRecordsInNbhdsCount // records not in the neighbouthood shape files
res54: Long = 43576
joined
  .groupBy($"neighborhood")
  .agg(countDistinct("tripId")
  .as("trips"))
  .orderBy(col("trips").desc)
  .show(5,false)
+-------------------------+-----+ |neighborhood |trips| +-------------------------+-----+ |South of Market |9891 | |Western Addition |6794 | |Downtown/Civic Center |6697 | |Financial District |6038 | |Mission |5620 | +-------------------------+-----+ only showing top 5 rows

Other spatial Algorithms in Spark are being explored for generic and more efficient scalable geospatial analytic tasks

See the Spark Summit East 2016 Talk by Ram on "what next?" and the latest notebooks on NYC taxi datasets in Ram's blogs.

Latest versionb of magellan is already using clever spatial indexing structures.

  • SpatialSpark aims to provide efficient spatial operations using Apache Spark.
    • Spatial Partition
      • Generate a spatial partition from input dataset, currently Fixed-Grid Partition (FGP), Binary-Split Partition (BSP) and Sort-Tile Partition (STP) are supported.
    • Spatial Range Query
      • includes both indexed and non-indexed query (useful for neighbourhood searches)
  • z-order Knn join

    • A space-filling curve trick to index multi-dimensional metric data into 1 Dimension. See: ieee paper and the slides.
  • AkNN = All K Nearest Neighbours - identify the k nearesy neighbours for all nodes simultaneously (cont AkNN is the streaming form of AkNN)

    • need to identify the right resources to do this scalably.
  • spark-knn-graphs: https://github.com/tdebatty/spark-knn-graphs


Step 0: Downloading datasets and load into dbfs

  • get the Uber data
  • get the San Francisco neighborhood data

getting uber data

(This only needs to be done once per shard!)

%sh ls
all.tsv conf derby.log eventlogs ganglia library-install-logs logs orig_planning_neighborhoods.zip SFNbhd
%sh
wget https://raw.githubusercontent.com/dima42/uber-gps-analysis/master/gpsdata/all.tsv
#wget http://lamastex.org/datasets/public/geospatial/uber/all.tsv
--2019-06-17 20:02:41-- https://raw.githubusercontent.com/dima42/uber-gps-analysis/master/gpsdata/all.tsv Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.40.133 Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.40.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 60947802 (58M) [text/plain] Saving to: ‘all.tsv.1’ 0K .......... .......... .......... .......... .......... 0% 19.2M 3s 50K .......... .......... .......... .......... .......... 0% 25.4M 3s 100K .......... .......... .......... .......... .......... 0% 28.2M 2s 150K .......... .......... .......... .......... .......... 0% 27.8M 2s 200K .......... .......... .......... .......... .......... 0% 25.6M 2s 250K .......... .......... .......... .......... .......... 0% 25.9M 2s 300K .......... .......... .......... .......... .......... 0% 31.3M 2s 350K .......... .......... .......... .......... .......... 0% 39.5M 2s 400K .......... .......... .......... .......... .......... 0% 28.7M 2s 450K .......... .......... .......... .......... .......... 0% 42.6M 2s 500K .......... .......... .......... .......... .......... 0% 39.2M 2s 550K .......... .......... .......... .......... .......... 1% 40.3M 2s 600K .......... .......... .......... .......... .......... 1% 72.8M 2s 650K .......... .......... .......... .......... .......... 1% 25.0M 2s 700K .......... .......... .......... .......... .......... 1% 77.3M 2s 750K .......... .......... .......... .......... .......... 1% 43.4M 2s 800K .......... .......... .......... .......... .......... 1% 55.3M 2s 850K .......... .......... .......... .......... .......... 1% 50.9M 2s 900K .......... .......... .......... .......... .......... 1% 46.6M 2s 950K .......... .......... .......... .......... .......... 1% 102M 2s 1000K .......... .......... .......... .......... .......... 1% 41.7M 2s 1050K .......... .......... .......... .......... .......... 1% 43.7M 2s 1100K .......... .......... .......... .......... .......... 1% 39.9M 2s 1150K .......... .......... .......... .......... .......... 2% 191M 2s 1200K .......... .......... .......... .......... .......... 2% 49.5M 2s 1250K .......... .......... .......... .......... .......... 2% 97.3M 1s 1300K .......... .......... .......... .......... .......... 2% 47.4M 1s 1350K .......... .......... .......... .......... .......... 2% 138M 1s 1400K .......... .......... .......... .......... .......... 2% 52.8M 1s 1450K .......... .......... .......... .......... .......... 2% 45.5M 1s 1500K .......... .......... .......... .......... .......... 2% 162M 1s 1550K .......... .......... .......... .......... .......... 2% 92.3M 1s 1600K .......... .......... .......... .......... .......... 2% 77.4M 1s 1650K .......... .......... .......... .......... .......... 2% 107M 1s 1700K .......... .......... .......... .......... .......... 2% 67.0M 1s 1750K .......... .......... .......... .......... .......... 3% 63.7M 1s 1800K .......... .......... .......... .......... .......... 3% 135M 1s 1850K .......... .......... .......... .......... .......... 3% 60.9M 1s 1900K .......... .......... .......... .......... .......... 3% 113M 1s 1950K .......... .......... .......... .......... .......... 3% 112M 1s 2000K .......... .......... .......... .......... .......... 3% 68.8M 1s 2050K .......... .......... .......... .......... .......... 3% 148M 1s 2100K .......... .......... .......... .......... .......... 3% 163M 1s 2150K .......... .......... .......... .......... .......... 3% 62.9M 1s 2200K .......... .......... .......... .......... .......... 3% 142M 1s 2250K .......... .......... .......... .......... .......... 3% 68.5M 1s 2300K .......... .......... .......... .......... .......... 3% 152M 1s 2350K .......... .......... .......... .......... .......... 4% 179M 1s 2400K .......... .......... .......... .......... .......... 4% 70.2M 1s 2450K .......... .......... .......... .......... .......... 4% 104M 1s 2500K .......... .......... .......... .......... .......... 4% 254M 1s 2550K .......... .......... .......... .......... .......... 4% 68.6M 1s 2600K .......... .......... .......... .......... .......... 4% 144M 1s 2650K .......... .......... .......... .......... .......... 4% 184M 1s 2700K .......... .......... .......... .......... .......... 4% 54.8M 1s 2750K .......... .......... .......... .......... .......... 4% 201M 1s 2800K .......... .......... .......... .......... .......... 4% 275M 1s 2850K .......... .......... .......... .......... .......... 4% 55.7M 1s 2900K .......... .......... .......... .......... .......... 4% 237M 1s 2950K .......... .......... .......... .......... .......... 5% 236M 1s 3000K .......... .......... .......... .......... .......... 5% 65.0M 1s 3050K .......... .......... .......... .......... .......... 5% 144M 1s 3100K .......... .......... .......... .......... .......... 5% 113M 1s 3150K .......... .......... .......... .......... .......... 5% 79.5M 1s 3200K .......... .......... .......... .......... .......... 5% 124M 1s 3250K .......... .......... .......... .......... .......... 5% 210M 1s 3300K .......... .......... .......... .......... .......... 5% 239M 1s 3350K .......... .......... .......... .......... .......... 5% 91.1M 1s 3400K .......... .......... .......... .......... .......... 5% 113M 1s 3450K .......... .......... .......... .......... .......... 5% 197M 1s 3500K .......... .......... .......... .......... .......... 5% 241M 1s 3550K .......... .......... .......... .......... .......... 6% 104M 1s 3600K .......... .......... .......... .......... .......... 6% 133M 1s 3650K .......... .......... .......... .......... .......... 6% 219M 1s 3700K .......... .......... .......... .......... .......... 6% 230M 1s 3750K .......... .......... .......... .......... .......... 6% 99.3M 1s 3800K .......... .......... .......... .......... .......... 6% 145M 1s 3850K .......... .......... .......... .......... .......... 6% 199M 1s 3900K .......... .......... .......... .......... .......... 6% 107M 1s 3950K .......... .......... .......... .......... .......... 6% 206M 1s 4000K .......... .......... .......... .......... .......... 6% 133M 1s 4050K .......... .......... .......... .......... .......... 6% 234M 1s 4100K .......... .......... .......... .......... .......... 6% 97.5M 1s 4150K .......... .......... .......... .......... .......... 7% 240M 1s 4200K .......... .......... .......... .......... .......... 7% 151M 1s 4250K .......... .......... .......... .......... .......... 7% 162M 1s 4300K .......... .......... .......... .......... .......... 7% 98.5M 1s 4350K .......... .......... .......... .......... .......... 7% 203M 1s 4400K .......... .......... .......... .......... .......... 7% 236M 1s 4450K .......... .......... .......... .......... .......... 7% 125M 1s 4500K .......... .......... .......... .......... .......... 7% 126M 1s 4550K .......... .......... .......... .......... .......... 7% 247M 1s 4600K .......... .......... .......... .......... .......... 7% 244M 1s 4650K .......... .......... .......... .......... .......... 7% 136M 1s 4700K .......... .......... .......... .......... .......... 7% 239M 1s 4750K .......... .......... .......... .......... .......... 8% 151M 1s 4800K .......... .......... .......... .......... .......... 8% 231M 1s 4850K .......... .......... .......... .......... .......... 8% 249M 1s 4900K .......... .......... .......... .......... .......... 8% 169M 1s 4950K .......... .......... .......... .......... .......... 8% 185M 1s 5000K .......... .......... .......... .......... .......... 8% 204M 1s 5050K .......... .......... .......... .......... .......... 8% 154M 1s 5100K .......... .......... .......... .......... .......... 8% 211M 1s 5150K .......... .......... .......... .......... .......... 8% 203M 1s 5200K .......... .......... .......... .......... .......... 8% 184M 1s 5250K .......... .......... .......... .......... .......... 8% 185M 1s 5300K .......... .......... .......... .......... .......... 8% 206M 1s 5350K .......... .......... .......... .......... .......... 9% 280M 1s 5400K .......... .......... .......... .......... .......... 9% 289M 1s 5450K .......... .......... .......... .......... .......... 9% 142M 1s 5500K .......... .......... .......... .......... .......... 9% 188M 1s 5550K .......... .......... .......... .......... .......... 9% 205M 1s 5600K .......... .......... .......... .......... .......... 9% 212M 1s 5650K .......... .......... .......... .......... .......... 9% 152M 1s 5700K .......... .......... .......... .......... .......... 9% 147M 1s 5750K .......... .......... .......... .......... .......... 9% 174M 1s 5800K .......... .......... .......... .......... .......... 9% 191M 1s 5850K .......... .......... .......... .......... .......... 9% 178M 1s 5900K .......... .......... .......... .......... .......... 9% 159M 1s 5950K .......... .......... .......... .......... .......... 10% 186M 1s 6000K .......... .......... .......... .......... .......... 10% 160M 1s 6050K .......... .......... .......... .......... .......... 10% 157M 1s 6100K .......... .......... .......... .......... .......... 10% 207M 1s 6150K .......... .......... .......... .......... .......... 10% 212M 1s 6200K .......... .......... .......... .......... .......... 10% 152M 1s 6250K .......... .......... .......... .......... .......... 10% 126M 1s 6300K .......... .......... .......... .......... .......... 10% 191M 1s 6350K .......... .......... .......... .......... .......... 10% 214M 1s 6400K .......... .......... .......... .......... .......... 10% 271M 1s 6450K .......... .......... .......... .......... .......... 10% 146M 1s 6500K .......... .......... .......... .......... .......... 11% 168M 1s 6550K .......... .......... .......... .......... .......... 11% 186M 1s 6600K .......... .......... .......... .......... .......... 11% 177M 1s 6650K .......... .......... .......... .......... .......... 11% 200M 1s 6700K .......... .......... .......... .......... .......... 11% 161M 1s 6750K .......... .......... .......... .......... .......... 11% 162M 1s 6800K .......... .......... .......... .......... .......... 11% 167M 1s 6850K .......... .......... .......... .......... .......... 11% 193M 1s 6900K .......... .......... .......... .......... .......... 11% 253M 1s 6950K .......... .......... .......... .......... .......... 11% 168M 1s 7000K .......... .......... .......... .......... .......... 11% 163M 1s 7050K .......... .......... .......... .......... .......... 11% 168M 1s 7100K .......... .......... .......... .......... .......... 12% 169M 1s 7150K .......... .......... .......... .......... .......... 12% 198M 1s 7200K .......... .......... .......... .......... .......... 12% 260M 1s 7250K .......... .......... .......... .......... .......... 12% 159M 1s 7300K .......... .......... .......... .......... .......... 12% 184M 1s 7350K .......... .......... .......... .......... .......... 12% 161M 1s 7400K .......... .......... .......... .......... .......... 12% 203M 1s 7450K .......... .......... .......... .......... .......... 12% 190M 1s 7500K .......... .......... .......... .......... .......... 12% 179M 1s 7550K .......... .......... .......... .......... .......... 12% 205M 1s 7600K .......... .......... .......... .......... .......... 12% 222M 1s 7650K .......... .......... .......... .......... .......... 12% 178M 1s 7700K .......... .......... .......... .......... .......... 13% 240M 1s 7750K .......... .......... .......... .......... .......... 13% 204M 1s 7800K .......... .......... .......... .......... .......... 13% 203M 1s 7850K .......... .......... .......... .......... .......... 13% 149M 1s 7900K .......... .......... .......... .......... .......... 13% 186M 1s 7950K .......... .......... .......... .......... .......... 13% 207M 1s 8000K .......... .......... .......... .......... .......... 13% 181M 1s 8050K .......... .......... .......... .......... .......... 13% 182M 1s 8100K .......... .......... .......... .......... .......... 13% 220M 1s 8150K .......... .......... .......... .......... .......... 13% 213M 1s 8200K .......... .......... .......... .......... .......... 13% 204M 1s 8250K .......... .......... .......... .......... .......... 13% 129M 1s 8300K .......... .......... .......... .......... .......... 14% 259M 1s 8350K .......... .......... .......... .......... .......... 14% 243M 0s 8400K .......... .......... .......... .......... .......... 14% 239M 0s 8450K .......... .......... .......... .......... .......... 14% 188M 0s 8500K .......... .......... .......... .......... .......... 14% 148M 0s 8550K .......... .......... .......... .......... .......... 14% 254M 0s 8600K .......... .......... .......... .......... .......... 14% 205M 0s 8650K .......... .......... .......... .......... .......... 14% 125M 0s 8700K .......... .......... .......... .......... .......... 14% 246M 0s 8750K .......... .......... .......... .......... .......... 14% 163M 0s 8800K .......... .......... .......... .......... .......... 14% 212M 0s 8850K .......... .......... .......... .......... .......... 14% 162M 0s 8900K .......... .......... .......... .......... .......... 15% 252M 0s 8950K .......... .......... .......... .......... .......... 15% 257M 0s 9000K .......... .......... .......... .......... .......... 15% 208M 0s 9050K .......... .......... .......... .......... .......... 15% 177M 0s 9100K .......... .......... .......... .......... .......... 15% 161M 0s 9150K .......... .......... .......... .......... .......... 15% 234M 0s 9200K .......... .......... .......... .......... .......... 15% 233M 0s 9250K .......... .......... .......... .......... .......... 15% 141M 0s 9300K .......... .......... .......... .......... .......... 15% 159M 0s 9350K .......... .......... .......... .......... .......... 15% 215M 0s 9400K .......... .......... .......... .......... .......... 15% 256M 0s 9450K .......... .......... .......... .......... .......... 15% 152M 0s 9500K .......... .......... .......... .......... .......... 16% 168M 0s 9550K .......... .......... .......... .......... .......... 16% 245M 0s 9600K .......... .......... .......... .......... .......... 16% 266M 0s 9650K .......... .......... .......... .......... .......... 16% 205M 0s 9700K .......... .......... .......... .......... .......... 16% 148M 0s 9750K .......... .......... .......... .......... .......... 16% 190M 0s 9800K .......... .......... .......... .......... .......... 16% 224M 0s 9850K .......... .......... .......... .......... .......... 16% 164M 0s 9900K .......... .......... .......... .......... .......... 16% 166M 0s 9950K .......... .......... .......... .......... .......... 16% 176M 0s 10000K .......... .......... .......... .......... .......... 16% 181M 0s 10050K .......... .......... .......... .......... .......... 16% 215M 0s 10100K .......... .......... .......... .......... .......... 17% 199M 0s 10150K .......... .......... .......... .......... .......... 17% 250M 0s 10200K .......... .......... .......... .......... .......... 17% 263M 0s 10250K .......... .......... .......... .......... .......... 17% 179M 0s 10300K .......... .......... .......... .......... .......... 17% 251M 0s 10350K .......... .......... .......... .......... .......... 17% 156M 0s 10400K .......... .......... .......... .......... .......... 17% 191M 0s 10450K .......... .......... .......... .......... .......... 17% 194M 0s 10500K .......... .......... .......... .......... .......... 17% 178M 0s 10550K .......... .......... .......... .......... .......... 17% 235M 0s 10600K .......... .......... .......... .......... .......... 17% 206M 0s 10650K .......... .......... .......... .......... .......... 17% 210M 0s 10700K .......... .......... .......... .......... .......... 18% 224M 0s 10750K .......... .......... .......... .......... .......... 18% 208M 0s 10800K .......... .......... .......... .......... .......... 18% 311M 0s 10850K .......... .......... .......... .......... .......... 18% 189M 0s 10900K .......... .......... .......... .......... .......... 18% 346M 0s 10950K .......... .......... .......... .......... .......... 18% 345M 0s 11000K .......... .......... .......... .......... .......... 18% 187M 0s 11050K .......... .......... .......... .......... .......... 18% 215M 0s 11100K .......... .......... .......... .......... .......... 18% 344M 0s 11150K .......... .......... .......... .......... .......... 18% 207M 0s 11200K .......... .......... .......... .......... .......... 18% 33.0M 0s 11250K .......... .......... .......... .......... .......... 18% 121M 0s 11300K .......... .......... .......... .......... .......... 19% 197M 0s 11350K .......... .......... .......... .......... .......... 19% 220M 0s 11400K .......... .......... .......... .......... .......... 19% 206M 0s 11450K .......... .......... .......... .......... .......... 19% 151M 0s 11500K .......... .......... .......... .......... .......... 19% 216M 0s 11550K .......... .......... .......... .......... .......... 19% 222M 0s 11600K .......... .......... .......... .......... .......... 19% 168M 0s 11650K .......... .......... .......... .......... .......... 19% 177M 0s 11700K .......... .......... .......... .......... .......... 19% 177M 0s 11750K .......... .......... .......... .......... .......... 19% 204M 0s 11800K .......... .......... .......... .......... .......... 19% 213M 0s 11850K .......... .......... .......... .......... .......... 19% 151M 0s 11900K .......... .......... .......... .......... .......... 20% 187M 0s 11950K .......... .......... .......... .......... .......... 20% 231M 0s 12000K .......... .......... .......... .......... .......... 20% 205M 0s 12050K .......... .......... .......... .......... .......... 20% 210M 0s 12100K .......... .......... .......... .......... .......... 20% 223M 0s 12150K .......... .......... .......... .......... .......... 20% 253M 0s 12200K .......... .......... .......... .......... .......... 20% 309M 0s 12250K .......... .......... .......... .......... .......... 20% 148M 0s 12300K .......... .......... .......... .......... .......... 20% 241M 0s 12350K .......... .......... .......... .......... .......... 20% 280M 0s 12400K .......... .......... .......... .......... .......... 20% 194M 0s 12450K .......... .......... .......... .......... .......... 21% 217M 0s 12500K .......... .......... .......... .......... .......... 21% 197M 0s 12550K .......... .......... .......... .......... .......... 21% 348M 0s 12600K .......... .......... .......... .......... .......... 21% 352M 0s 12650K .......... .......... .......... .......... .......... 21% 192M 0s 12700K .......... .......... .......... .......... .......... 21% 187M 0s 12750K .......... .......... .......... .......... .......... 21% 171M 0s 12800K .......... .......... .......... .......... .......... 21% 243M 0s 12850K .......... .......... .......... .......... .......... 21% 166M 0s 12900K .......... .......... .......... .......... .......... 21% 174M 0s 12950K .......... .......... .......... .......... .......... 21% 129M 0s 13000K .......... .......... .......... .......... .......... 21% 186M 0s 13050K .......... .......... .......... .......... .......... 22% 201M 0s 13100K .......... .......... .......... .......... .......... 22% 202M 0s 13150K .......... .......... .......... .......... .......... 22% 200M 0s 13200K .......... .......... .......... .......... .......... 22% 247M 0s 13250K .......... .......... .......... .......... .......... 22% 219M 0s 13300K .......... .......... .......... .......... .......... 22% 216M 0s 13350K .......... .......... .......... .......... .......... 22% 192M 0s 13400K .......... .......... .......... .......... .......... 22% 134M 0s 13450K .......... .......... .......... .......... .......... 22% 199M 0s 13500K .......... .......... .......... .......... .......... 22% 193M 0s 13550K .......... .......... .......... .......... .......... 22% 194M 0s 13600K .......... .......... .......... .......... .......... 22% 198M 0s 13650K .......... .......... .......... .......... .......... 23% 183M 0s 13700K .......... .......... .......... .......... .......... 23% 242M 0s 13750K .......... .......... .......... .......... .......... 23% 198M 0s 13800K .......... .......... .......... .......... .......... 23% 214M 0s 13850K .......... .......... .......... .......... .......... 23% 202M 0s 13900K .......... .......... .......... .......... .......... 23% 213M 0s 13950K .......... .......... .......... .......... .......... 23% 200M 0s 14000K .......... .......... .......... .......... .......... 23% 222M 0s 14050K .......... .......... .......... .......... .......... 23% 193M 0s 14100K .......... .......... .......... .......... .......... 23% 164M 0s 14150K .......... .......... .......... .......... .......... 23% 180M 0s 14200K .......... .......... .......... .......... .......... 23% 181M 0s 14250K .......... .......... .......... .......... .......... 24% 159M 0s 14300K .......... .......... .......... .......... .......... 24% 238M 0s 14350K .......... .......... .......... .......... .......... 24% 204M 0s 14400K .......... .......... .......... .......... .......... 24% 240M 0s 14450K .......... .......... .......... .......... .......... 24% 185M 0s 14500K .......... .......... .......... .......... .......... 24% 201M 0s 14550K .......... .......... .......... .......... .......... 24% 240M 0s 14600K .......... .......... .......... .......... .......... 24% 211M 0s 14650K .......... .......... .......... .......... .......... 24% 183M 0s 14700K .......... .......... .......... .......... .......... 24% 156M 0s 14750K .......... .......... .......... .......... .......... 24% 232M 0s 14800K .......... .......... .......... .......... .......... 24% 239M 0s 14850K .......... .......... .......... .......... .......... 25% 165M 0s 14900K .......... .......... .......... .......... .......... 25% 233M 0s 14950K .......... .......... .......... .......... .......... 25% 223M 0s 15000K .......... .......... .......... .......... .......... 25% 238M 0s 15050K .......... .......... .......... .......... .......... 25% 162M 0s 15100K .......... .......... .......... .......... .......... 25% 187M 0s 15150K .......... .......... .......... .......... .......... 25% 232M 0s 15200K .......... .......... .......... .......... .......... 25% 200M 0s 15250K .......... .......... .......... .......... .......... 25% 206M 0s 15300K .......... .......... .......... .......... .......... 25% 226M 0s 15350K .......... .......... .......... .......... .......... 25% 182M 0s 15400K .......... .......... .......... .......... .......... 25% 208M 0s 15450K .......... .......... .......... .......... .......... 26% 178M 0s 15500K .......... .......... .......... .......... .......... 26% 211M 0s 15550K .......... .......... .......... .......... .......... 26% 227M 0s 15600K .......... .......... .......... .......... .......... 26% 182M 0s 15650K .......... .......... .......... .......... .......... 26% 196M 0s 15700K .......... .......... .......... .......... .......... 26% 235M 0s 15750K .......... .......... .......... .......... .......... 26% 204M 0s 15800K .......... .......... .......... .......... .......... 26% 209M 0s 15850K .......... .......... .......... .......... .......... 26% 172M 0s 15900K .......... .......... .......... .......... .......... 26% 220M 0s 15950K .......... .......... .......... .......... .......... 26% 229M 0s 16000K .......... .......... .......... .......... .......... 26% 185M 0s 16050K .......... .......... .......... .......... .......... 27% 174M 0s 16100K .......... .......... .......... .......... .......... 27% 198M 0s 16150K .......... .......... .......... .......... .......... 27% 229M 0s *** WARNING: skipped 41040 bytes of output *** 43200K .......... .......... .......... .......... .......... 72% 208M 0s 43250K .......... .......... .......... .......... .......... 72% 286M 0s 43300K .......... .......... .......... .......... .......... 72% 277M 0s 43350K .......... .......... .......... .......... .......... 72% 180M 0s 43400K .......... .......... .......... .......... .......... 73% 223M 0s 43450K .......... .......... .......... .......... .......... 73% 155M 0s 43500K .......... .......... .......... .......... .......... 73% 267M 0s 43550K .......... .......... .......... .......... .......... 73% 292M 0s 43600K .......... .......... .......... .......... .......... 73% 172M 0s 43650K .......... .......... .......... .......... .......... 73% 191M 0s 43700K .......... .......... .......... .......... .......... 73% 218M 0s 43750K .......... .......... .......... .......... .......... 73% 340M 0s 43800K .......... .......... .......... .......... .......... 73% 347M 0s 43850K .......... .......... .......... .......... .......... 73% 177M 0s 43900K .......... .......... .......... .......... .......... 73% 249M 0s 43950K .......... .......... .......... .......... .......... 73% 361M 0s 44000K .......... .......... .......... .......... .......... 74% 179M 0s 44050K .......... .......... .......... .......... .......... 74% 197M 0s 44100K .......... .......... .......... .......... .......... 74% 206M 0s 44150K .......... .......... .......... .......... .......... 74% 172M 0s 44200K .......... .......... .......... .......... .......... 74% 185M 0s 44250K .......... .......... .......... .......... .......... 74% 142M 0s 44300K .......... .......... .......... .......... .......... 74% 162M 0s 44350K .......... .......... .......... .......... .......... 74% 202M 0s 44400K .......... .......... .......... .......... .......... 74% 189M 0s 44450K .......... .......... .......... .......... .......... 74% 214M 0s 44500K .......... .......... .......... .......... .......... 74% 181M 0s 44550K .......... .......... .......... .......... .......... 74% 207M 0s 44600K .......... .......... .......... .......... .......... 75% 244M 0s 44650K .......... .......... .......... .......... .......... 75% 157M 0s 44700K .......... .......... .......... .......... .......... 75% 241M 0s 44750K .......... .......... .......... .......... .......... 75% 183M 0s 44800K .......... .......... .......... .......... .......... 75% 160M 0s 44850K .......... .......... .......... .......... .......... 75% 216M 0s 44900K .......... .......... .......... .......... .......... 75% 183M 0s 44950K .......... .......... .......... .......... .......... 75% 192M 0s 45000K .......... .......... .......... .......... .......... 75% 175M 0s 45050K .......... .......... .......... .......... .......... 75% 148M 0s 45100K .......... .......... .......... .......... .......... 75% 275M 0s 45150K .......... .......... .......... .......... .......... 75% 262M 0s 45200K .......... .......... .......... .......... .......... 76% 209M 0s 45250K .......... .......... .......... .......... .......... 76% 308M 0s 45300K .......... .......... .......... .......... .......... 76% 183M 0s 45350K .......... .......... .......... .......... .......... 76% 327M 0s 45400K .......... .......... .......... .......... .......... 76% 354M 0s 45450K .......... .......... .......... .......... .......... 76% 262M 0s 45500K .......... .......... .......... .......... .......... 76% 156M 0s 45550K .......... .......... .......... .......... .......... 76% 210M 0s 45600K .......... .......... .......... .......... .......... 76% 35.2M 0s 45650K .......... .......... .......... .......... .......... 76% 206M 0s 45700K .......... .......... .......... .......... .......... 76% 222M 0s 45750K .......... .......... .......... .......... .......... 76% 243M 0s 45800K .......... .......... .......... .......... .......... 77% 248M 0s 45850K .......... .......... .......... .......... .......... 77% 185M 0s 45900K .......... .......... .......... .......... .......... 77% 200M 0s 45950K .......... .......... .......... .......... .......... 77% 179M 0s 46000K .......... .......... .......... .......... .......... 77% 179M 0s 46050K .......... .......... .......... .......... .......... 77% 164M 0s 46100K .......... .......... .......... .......... .......... 77% 152M 0s 46150K .......... .......... .......... .......... .......... 77% 180M 0s 46200K .......... .......... .......... .......... .......... 77% 169M 0s 46250K .......... .......... .......... .......... .......... 77% 148M 0s 46300K .......... .......... .......... .......... .......... 77% 237M 0s 46350K .......... .......... .......... .......... .......... 77% 261M 0s 46400K .......... .......... .......... .......... .......... 78% 195M 0s 46450K .......... .......... .......... .......... .......... 78% 172M 0s 46500K .......... .......... .......... .......... .......... 78% 172M 0s 46550K .......... .......... .......... .......... .......... 78% 211M 0s 46600K .......... .......... .......... .......... .......... 78% 205M 0s 46650K .......... .......... .......... .......... .......... 78% 132M 0s 46700K .......... .......... .......... .......... .......... 78% 237M 0s 46750K .......... .......... .......... .......... .......... 78% 233M 0s 46800K .......... .......... .......... .......... .......... 78% 234M 0s 46850K .......... .......... .......... .......... .......... 78% 156M 0s 46900K .......... .......... .......... .......... .......... 78% 164M 0s 46950K .......... .......... .......... .......... .......... 78% 168M 0s 47000K .......... .......... .......... .......... .......... 79% 235M 0s 47050K .......... .......... .......... .......... .......... 79% 136M 0s 47100K .......... .......... .......... .......... .......... 79% 155M 0s 47150K .......... .......... .......... .......... .......... 79% 197M 0s 47200K .......... .......... .......... .......... .......... 79% 243M 0s 47250K .......... .......... .......... .......... .......... 79% 156M 0s 47300K .......... .......... .......... .......... .......... 79% 157M 0s 47350K .......... .......... .......... .......... .......... 79% 244M 0s 47400K .......... .......... .......... .......... .......... 79% 249M 0s 47450K .......... .......... .......... .......... .......... 79% 163M 0s 47500K .......... .......... .......... .......... .......... 79% 154M 0s 47550K .......... .......... .......... .......... .......... 79% 193M 0s 47600K .......... .......... .......... .......... .......... 80% 249M 0s 47650K .......... .......... .......... .......... .......... 80% 226M 0s 47700K .......... .......... .......... .......... .......... 80% 187M 0s 47750K .......... .......... .......... .......... .......... 80% 163M 0s 47800K .......... .......... .......... .......... .......... 80% 153M 0s 47850K .......... .......... .......... .......... .......... 80% 163M 0s 47900K .......... .......... .......... .......... .......... 80% 196M 0s 47950K .......... .......... .......... .......... .......... 80% 157M 0s 48000K .......... .......... .......... .......... .......... 80% 194M 0s 48050K .......... .......... .......... .......... .......... 80% 163M 0s 48100K .......... .......... .......... .......... .......... 80% 186M 0s 48150K .......... .......... .......... .......... .......... 80% 193M 0s 48200K .......... .......... .......... .......... .......... 81% 240M 0s 48250K .......... .......... .......... .......... .......... 81% 164M 0s 48300K .......... .......... .......... .......... .......... 81% 230M 0s 48350K .......... .......... .......... .......... .......... 81% 170M 0s 48400K .......... .......... .......... .......... .......... 81% 236M 0s 48450K .......... .......... .......... .......... .......... 81% 175M 0s 48500K .......... .......... .......... .......... .......... 81% 147M 0s 48550K .......... .......... .......... .......... .......... 81% 157M 0s 48600K .......... .......... .......... .......... .......... 81% 232M 0s 48650K .......... .......... .......... .......... .......... 81% 213M 0s 48700K .......... .......... .......... .......... .......... 81% 161M 0s 48750K .......... .......... .......... .......... .......... 81% 178M 0s 48800K .......... .......... .......... .......... .......... 82% 199M 0s 48850K .......... .......... .......... .......... .......... 82% 226M 0s 48900K .......... .......... .......... .......... .......... 82% 184M 0s 48950K .......... .......... .......... .......... .......... 82% 161M 0s 49000K .......... .......... .......... .......... .......... 82% 202M 0s 49050K .......... .......... .......... .......... .......... 82% 158M 0s 49100K .......... .......... .......... .......... .......... 82% 207M 0s 49150K .......... .......... .......... .......... .......... 82% 175M 0s 49200K .......... .......... .......... .......... .......... 82% 215M 0s 49250K .......... .......... .......... .......... .......... 82% 162M 0s 49300K .......... .......... .......... .......... .......... 82% 253M 0s 49350K .......... .......... .......... .......... .......... 82% 198M 0s 49400K .......... .......... .......... .......... .......... 83% 181M 0s 49450K .......... .......... .......... .......... .......... 83% 196M 0s 49500K .......... .......... .......... .......... .......... 83% 168M 0s 49550K .......... .......... .......... .......... .......... 83% 227M 0s 49600K .......... .......... .......... .......... .......... 83% 159M 0s 49650K .......... .......... .......... .......... .......... 83% 202M 0s 49700K .......... .......... .......... .......... .......... 83% 210M 0s 49750K .......... .......... .......... .......... .......... 83% 161M 0s 49800K .......... .......... .......... .......... .......... 83% 183M 0s 49850K .......... .......... .......... .......... .......... 83% 145M 0s 49900K .......... .......... .......... .......... .......... 83% 216M 0s 49950K .......... .......... .......... .......... .......... 84% 153M 0s 50000K .......... .......... .......... .......... .......... 84% 238M 0s 50050K .......... .......... .......... .......... .......... 84% 223M 0s 50100K .......... .......... .......... .......... .......... 84% 174M 0s 50150K .......... .......... .......... .......... .......... 84% 156M 0s 50200K .......... .......... .......... .......... .......... 84% 164M 0s 50250K .......... .......... .......... .......... .......... 84% 212M 0s 50300K .......... .......... .......... .......... .......... 84% 180M 0s 50350K .......... .......... .......... .......... .......... 84% 191M 0s 50400K .......... .......... .......... .......... .......... 84% 184M 0s 50450K .......... .......... .......... .......... .......... 84% 216M 0s 50500K .......... .......... .......... .......... .......... 84% 247M 0s 50550K .......... .......... .......... .......... .......... 85% 149M 0s 50600K .......... .......... .......... .......... .......... 85% 223M 0s 50650K .......... .......... .......... .......... .......... 85% 162M 0s 50700K .......... .......... .......... .......... .......... 85% 221M 0s 50750K .......... .......... .......... .......... .......... 85% 158M 0s 50800K .......... .......... .......... .......... .......... 85% 192M 0s 50850K .......... .......... .......... .......... .......... 85% 225M 0s 50900K .......... .......... .......... .......... .......... 85% 185M 0s 50950K .......... .......... .......... .......... .......... 85% 197M 0s 51000K .......... .......... .......... .......... .......... 85% 177M 0s 51050K .......... .......... .......... .......... .......... 85% 255M 0s 51100K .......... .......... .......... .......... .......... 85% 216M 0s 51150K .......... .......... .......... .......... .......... 86% 219M 0s 51200K .......... .......... .......... .......... .......... 86% 337M 0s 51250K .......... .......... .......... .......... .......... 86% 196M 0s 51300K .......... .......... .......... .......... .......... 86% 339M 0s 51350K .......... .......... .......... .......... .......... 86% 357M 0s 51400K .......... .......... .......... .......... .......... 86% 353M 0s 51450K .......... .......... .......... .......... .......... 86% 187M 0s 51500K .......... .......... .......... .......... .......... 86% 195M 0s 51550K .......... .......... .......... .......... .......... 86% 201M 0s 51600K .......... .......... .......... .......... .......... 86% 159M 0s 51650K .......... .......... .......... .......... .......... 86% 171M 0s 51700K .......... .......... .......... .......... .......... 86% 176M 0s 51750K .......... .......... .......... .......... .......... 87% 175M 0s 51800K .......... .......... .......... .......... .......... 87% 216M 0s 51850K .......... .......... .......... .......... .......... 87% 153M 0s 51900K .......... .......... .......... .......... .......... 87% 237M 0s 51950K .......... .......... .......... .......... .......... 87% 205M 0s 52000K .......... .......... .......... .......... .......... 87% 253M 0s 52050K .......... .......... .......... .......... .......... 87% 224M 0s 52100K .......... .......... .......... .......... .......... 87% 174M 0s 52150K .......... .......... .......... .......... .......... 87% 243M 0s 52200K .......... .......... .......... .......... .......... 87% 200M 0s 52250K .......... .......... .......... .......... .......... 87% 133M 0s 52300K .......... .......... .......... .......... .......... 87% 240M 0s 52350K .......... .......... .......... .......... .......... 88% 214M 0s 52400K .......... .......... .......... .......... .......... 88% 149M 0s 52450K .......... .......... .......... .......... .......... 88% 179M 0s 52500K .......... .......... .......... .......... .......... 88% 188M 0s 52550K .......... .......... .......... .......... .......... 88% 166M 0s 52600K .......... .......... .......... .......... .......... 88% 238M 0s 52650K .......... .......... .......... .......... .......... 88% 150M 0s 52700K .......... .......... .......... .......... .......... 88% 165M 0s 52750K .......... .......... .......... .......... .......... 88% 185M 0s 52800K .......... .......... .......... .......... .......... 88% 250M 0s 52850K .......... .......... .......... .......... .......... 88% 177M 0s 52900K .......... .......... .......... .......... .......... 88% 172M 0s 52950K .......... .......... .......... .......... .......... 89% 160M 0s 53000K .......... .......... .......... .......... .......... 89% 185M 0s 53050K .......... .......... .......... .......... .......... 89% 160M 0s 53100K .......... .......... .......... .......... .......... 89% 156M 0s 53150K .......... .......... .......... .......... .......... 89% 141M 0s 53200K .......... .......... .......... .......... .......... 89% 166M 0s 53250K .......... .......... .......... .......... .......... 89% 176M 0s 53300K .......... .......... .......... .......... .......... 89% 217M 0s 53350K .......... .......... .......... .......... .......... 89% 184M 0s 53400K .......... .......... .......... .......... .......... 89% 164M 0s 53450K .......... .......... .......... .......... .......... 89% 183M 0s 53500K .......... .......... .......... .......... .......... 89% 252M 0s 53550K .......... .......... .......... .......... .......... 90% 200M 0s 53600K .......... .......... .......... .......... .......... 90% 154M 0s 53650K .......... .......... .......... .......... .......... 90% 214M 0s 53700K .......... .......... .......... .......... .......... 90% 254M 0s 53750K .......... .......... .......... .......... .......... 90% 246M 0s 53800K .......... .......... .......... .......... .......... 90% 185M 0s 53850K .......... .......... .......... .......... .......... 90% 126M 0s 53900K .......... .......... .......... .......... .......... 90% 193M 0s 53950K .......... .......... .......... .......... .......... 90% 255M 0s 54000K .......... .......... .......... .......... .......... 90% 185M 0s 54050K .......... .......... .......... .......... .......... 90% 143M 0s 54100K .......... .......... .......... .......... .......... 90% 150M 0s 54150K .......... .......... .......... .......... .......... 91% 233M 0s 54200K .......... .......... .......... .......... .......... 91% 182M 0s 54250K .......... .......... .......... .......... .......... 91% 141M 0s 54300K .......... .......... .......... .......... .......... 91% 256M 0s 54350K .......... .......... .......... .......... .......... 91% 238M 0s 54400K .......... .......... .......... .......... .......... 91% 218M 0s 54450K .......... .......... .......... .......... .......... 91% 148M 0s 54500K .......... .......... .......... .......... .......... 91% 179M 0s 54550K .......... .......... .......... .......... .......... 91% 252M 0s 54600K .......... .......... .......... .......... .......... 91% 186M 0s 54650K .......... .......... .......... .......... .......... 91% 137M 0s 54700K .......... .......... .......... .......... .......... 91% 255M 0s 54750K .......... .......... .......... .......... .......... 92% 256M 0s 54800K .......... .......... .......... .......... .......... 92% 239M 0s 54850K .......... .......... .......... .......... .......... 92% 144M 0s 54900K .......... .......... .......... .......... .......... 92% 156M 0s 54950K .......... .......... .......... .......... .......... 92% 218M 0s 55000K .......... .......... .......... .......... .......... 92% 239M 0s 55050K .......... .......... .......... .......... .......... 92% 130M 0s 55100K .......... .......... .......... .......... .......... 92% 158M 0s 55150K .......... .......... .......... .......... .......... 92% 200M 0s 55200K .......... .......... .......... .......... .......... 92% 249M 0s 55250K .......... .......... .......... .......... .......... 92% 169M 0s 55300K .......... .......... .......... .......... .......... 92% 154M 0s 55350K .......... .......... .......... .......... .......... 93% 241M 0s 55400K .......... .......... .......... .......... .......... 93% 172M 0s 55450K .......... .......... .......... .......... .......... 93% 194M 0s 55500K .......... .......... .......... .......... .......... 93% 164M 0s 55550K .......... .......... .......... .......... .......... 93% 165M 0s 55600K .......... .......... .......... .......... .......... 93% 187M 0s 55650K .......... .......... .......... .......... .......... 93% 229M 0s 55700K .......... .......... .......... .......... .......... 93% 206M 0s 55750K .......... .......... .......... .......... .......... 93% 179M 0s 55800K .......... .......... .......... .......... .......... 93% 343M 0s 55850K .......... .......... .......... .......... .......... 93% 195M 0s 55900K .......... .......... .......... .......... .......... 94% 267M 0s 55950K .......... .......... .......... .......... .......... 94% 332M 0s 56000K .......... .......... .......... .......... .......... 94% 165M 0s 56050K .......... .......... .......... .......... .......... 94% 251M 0s 56100K .......... .......... .......... .......... .......... 94% 350M 0s 56150K .......... .......... .......... .......... .......... 94% 244M 0s 56200K .......... .......... .......... .......... .......... 94% 238M 0s 56250K .......... .......... .......... .......... .......... 94% 261M 0s 56300K .......... .......... .......... .......... .......... 94% 174M 0s 56350K .......... .......... .......... .......... .......... 94% 215M 0s 56400K .......... .......... .......... .......... .......... 94% 313M 0s 56450K .......... .......... .......... .......... .......... 94% 157M 0s 56500K .......... .......... .......... .......... .......... 95% 193M 0s 56550K .......... .......... .......... .......... .......... 95% 184M 0s 56600K .......... .......... .......... .......... .......... 95% 255M 0s 56650K .......... .......... .......... .......... .......... 95% 188M 0s 56700K .......... .......... .......... .......... .......... 95% 217M 0s 56750K .......... .......... .......... .......... .......... 95% 202M 0s 56800K .......... .......... .......... .......... .......... 95% 195M 0s 56850K .......... .......... .......... .......... .......... 95% 199M 0s 56900K .......... .......... .......... .......... .......... 95% 227M 0s 56950K .......... .......... .......... .......... .......... 95% 200M 0s 57000K .......... .......... .......... .......... .......... 95% 192M 0s 57050K .......... .......... .......... .......... .......... 95% 172M 0s 57100K .......... .......... .......... .......... .......... 96% 230M 0s 57150K .......... .......... .......... .......... .......... 96% 175M 0s 57200K .......... .......... .......... .......... .......... 96% 192M 0s 57250K .......... .......... .......... .......... .......... 96% 185M 0s 57300K .......... .......... .......... .......... .......... 96% 223M 0s 57350K .......... .......... .......... .......... .......... 96% 206M 0s 57400K .......... .......... .......... .......... .......... 96% 223M 0s 57450K .......... .......... .......... .......... .......... 96% 159M 0s 57500K .......... .......... .......... .......... .......... 96% 217M 0s 57550K .......... .......... .......... .......... .......... 96% 216M 0s 57600K .......... .......... .......... .......... .......... 96% 218M 0s 57650K .......... .......... .......... .......... .......... 96% 202M 0s 57700K .......... .......... .......... .......... .......... 97% 217M 0s 57750K .......... .......... .......... .......... .......... 97% 207M 0s 57800K .......... .......... .......... .......... .......... 97% 209M 0s 57850K .......... .......... .......... .......... .......... 97% 165M 0s 57900K .......... .......... .......... .......... .......... 97% 218M 0s 57950K .......... .......... .......... .......... .......... 97% 197M 0s 58000K .......... .......... .......... .......... .......... 97% 206M 0s 58050K .......... .......... .......... .......... .......... 97% 195M 0s 58100K .......... .......... .......... .......... .......... 97% 208M 0s 58150K .......... .......... .......... .......... .......... 97% 212M 0s 58200K .......... .......... .......... .......... .......... 97% 216M 0s 58250K .......... .......... .......... .......... .......... 97% 188M 0s 58300K .......... .......... .......... .......... .......... 98% 209M 0s 58350K .......... .......... .......... .......... .......... 98% 207M 0s 58400K .......... .......... .......... .......... .......... 98% 210M 0s 58450K .......... .......... .......... .......... .......... 98% 190M 0s 58500K .......... .......... .......... .......... .......... 98% 198M 0s 58550K .......... .......... .......... .......... .......... 98% 221M 0s 58600K .......... .......... .......... .......... .......... 98% 195M 0s 58650K .......... .......... .......... .......... .......... 98% 176M 0s 58700K .......... .......... .......... .......... .......... 98% 237M 0s 58750K .......... .......... .......... .......... .......... 98% 244M 0s 58800K .......... .......... .......... .......... .......... 98% 208M 0s 58850K .......... .......... .......... .......... .......... 98% 195M 0s 58900K .......... .......... .......... .......... .......... 99% 222M 0s 58950K .......... .......... .......... .......... .......... 99% 343M 0s 59000K .......... .......... .......... .......... .......... 99% 231M 0s 59050K .......... .......... .......... .......... .......... 99% 191M 0s 59100K .......... .......... .......... .......... .......... 99% 207M 0s 59150K .......... .......... .......... .......... .......... 99% 227M 0s 59200K .......... .......... .......... .......... .......... 99% 232M 0s 59250K .......... .......... .......... .......... .......... 99% 313M 0s 59300K .......... .......... .......... .......... .......... 99% 357M 0s 59350K .......... .......... .......... .......... .......... 99% 352M 0s 59400K .......... .......... .......... .......... .......... 99% 357M 0s 59450K .......... .......... .......... .......... .......... 99% 299M 0s 59500K .......... ......... 100% 416M=0.4s 2019-06-17 20:02:42 (159 MB/s) - ‘all.tsv.1’ saved [60947802/60947802]
%sh
pwd
/databricks/driver
dbutils.fs.mkdirs("dbfs:/datasets/magellan") //need not be done again!
res92: Boolean = true
dbutils.fs.cp("file:/databricks/driver/all.tsv", "dbfs:/datasets/magellan/") // load into dbfs
res93: Boolean = true
display(dbutils.fs.ls("dbfs:/datasets/magellan/"))
dbfs:/datasets/magellan/SFNbhd/SFNbhd/0
dbfs:/datasets/magellan/all.tsvall.tsv60947802

Getting SF Neighborhood Data

%sh
wget http://www.lamastex.org/courses/ScalableDataScience/2016/datasets/magellan/UberSF/planning_neighborhoods.zip
--2019-06-17 20:02:44-- http://www.lamastex.org/courses/ScalableDataScience/2016/datasets/magellan/UberSF/planning_neighborhoods.zip Resolving www.lamastex.org (www.lamastex.org)... 166.62.28.100 Connecting to www.lamastex.org (www.lamastex.org)|166.62.28.100|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 163771 (160K) [application/zip] Saving to: ‘planning_neighborhoods.zip’ 0K .......... .......... .......... .......... .......... 31% 149K 1s 50K .......... .......... .......... .......... .......... 62% 297K 0s 100K .......... .......... .......... .......... .......... 93% 298K 0s 150K ......... 100% 69.7M=0.7s 2019-06-17 20:02:45 (238 KB/s) - ‘planning_neighborhoods.zip’ saved [163771/163771]
%sh
unzip planning_neighborhoods.zip
Archive: planning_neighborhoods.zip inflating: planning_neighborhoods.dbf inflating: planning_neighborhoods.shx inflating: planning_neighborhoods.shp.xml inflating: planning_neighborhoods.shp inflating: planning_neighborhoods.sbx inflating: planning_neighborhoods.sbn inflating: planning_neighborhoods.prj
%sh
ls -al
total 119652 drwxr-xr-x 1 root root 4096 Jun 17 20:02 . drwxr-xr-x 1 root root 4096 Jun 17 19:47 .. -rw-r--r-- 1 root root 60947802 Jun 17 20:00 all.tsv -rw-r--r-- 1 root root 60947802 Jun 17 20:02 all.tsv.1 drwxr-xr-x 2 root root 4096 Jan 1 1970 conf -rw-r--r-- 1 root root 731 Jun 17 19:47 derby.log drwxr-xr-x 3 root root 4096 Jun 17 19:47 eventlogs drwxr-xr-x 2 root root 4096 Jun 17 20:00 ganglia drwxr-xr-x 2 root root 4096 Jun 17 19:48 library-install-logs drwxr-xr-x 2 root root 4096 Jun 17 20:00 logs -rw-r--r-- 1 root root 163771 Nov 9 2015 orig_planning_neighborhoods.zip -rw-r--r-- 1 root root 1028 Jan 20 2012 planning_neighborhoods.dbf -rw-r--r-- 1 root root 567 Jan 20 2012 planning_neighborhoods.prj -rw-r--r-- 1 root root 516 Jan 20 2012 planning_neighborhoods.sbn -rw-r--r-- 1 root root 164 Jan 20 2012 planning_neighborhoods.sbx -rw-r--r-- 1 root root 214576 Jan 20 2012 planning_neighborhoods.shp -rw-r--r-- 1 root root 21958 Jan 20 2012 planning_neighborhoods.shp.xml -rw-r--r-- 1 root root 396 Jan 20 2012 planning_neighborhoods.shx -rw-r--r-- 1 root root 163771 Nov 9 2015 planning_neighborhoods.zip drwxr-xr-x 2 root root 4096 Jun 17 20:01 SFNbhd
%sh 
mv planning_neighborhoods.zip orig_planning_neighborhoods.zip

Let's prepare the files in a local directory named SFNbhd

  • make a directory called SFNbhd using the command mkdir SFNbhd
  • after making the directory specified by && move the files starting with planning_nei in to the directory we made SFNbhd by:
    • mv planning_nei* SFNbhd
  • list the contents of the current directory using ls
  • finally list the contents of the directory SFNbhd inside current directory using ls -al SFNbhd
%sh
mkdir SFNbhd && mv planning_nei* SFNbhd && ls 
ls -al SFNbhd
mkdir: cannot create directory ‘SFNbhd’: File exists total 264 drwxr-xr-x 2 root root 4096 Jun 17 20:01 . drwxr-xr-x 1 root root 4096 Jun 17 20:02 .. -rw-r--r-- 1 root root 1028 Jan 20 2012 planning_neighborhoods.dbf -rw-r--r-- 1 root root 567 Jan 20 2012 planning_neighborhoods.prj -rw-r--r-- 1 root root 516 Jan 20 2012 planning_neighborhoods.sbn -rw-r--r-- 1 root root 164 Jan 20 2012 planning_neighborhoods.sbx -rw-r--r-- 1 root root 214576 Jan 20 2012 planning_neighborhoods.shp -rw-r--r-- 1 root root 21958 Jan 20 2012 planning_neighborhoods.shp.xml -rw-r--r-- 1 root root 396 Jan 20 2012 planning_neighborhoods.shx
dbutils.fs.mkdirs("dbfs:/datasets/magellan/SFNbhd") //make the directory in dbfs - need not be done again!
res94: Boolean = true
// just copy each file - done for pedantic reasons; we can do more sophisticated dbfs loads for large shape files
dbutils.fs.cp("file:/databricks/driver/SFNbhd/planning_neighborhoods.dbf", "dbfs:/datasets/magellan/SFNbhd/")
dbutils.fs.cp("file:/databricks/driver/SFNbhd/planning_neighborhoods.prj", "dbfs:/datasets/magellan/SFNbhd/")
dbutils.fs.cp("file:/databricks/driver/SFNbhd/planning_neighborhoods.sbn", "dbfs:/datasets/magellan/SFNbhd/")
dbutils.fs.cp("file:/databricks/driver/SFNbhd/planning_neighborhoods.sbx", "dbfs:/datasets/magellan/SFNbhd/")
dbutils.fs.cp("file:/databricks/driver/SFNbhd/planning_neighborhoods.shp", "dbfs:/datasets/magellan/SFNbhd/")
dbutils.fs.cp("file:/databricks/driver/SFNbhd/planning_neighborhoods.shp.xml", "dbfs:/datasets/magellan/SFNbhd/")
dbutils.fs.cp("file:/databricks/driver/SFNbhd/planning_neighborhoods.shx", "dbfs:/datasets/magellan/SFNbhd/")
res95: Boolean = true
display(dbutils.fs.ls("dbfs:/datasets/magellan/SFNbhd/"))
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.dbfplanning_neighborhoods.dbf1028
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.prjplanning_neighborhoods.prj567
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.sbnplanning_neighborhoods.sbn516
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.sbxplanning_neighborhoods.sbx164
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.shpplanning_neighborhoods.shp214576
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.shp.xmlplanning_neighborhoods.shp.xml21958
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.shxplanning_neighborhoods.shx396

End of Step 0: downloading and putting data in dbfs