06. Statistics and List Comprehensions with New Zealand Earthquakes

Mathematical Statistical and Computational Foundations for Data Scientists

©2018 Raazesh Sainudiin. Attribution 4.0 International (CC BY 4.0)

  • Live Data-fetch of NZ EQ Data
  • More on Statistics
  • Sample Mean
  • Sample Variance
  • Order Statistics
  • Frequencies
  • Empirical Mass Function
  • Empirical Distribution Function
  • List Comprehensions
  • New Zealand Earthquakes
  • Functions Revision

Live Data-fetching Exercise Now

Go to https://quakesearch.geonet.org.nz/ and download data on NZ earthquakes.

<img src = "images/GeoNetQuakeSearchDownloadCSV.png" width =800>

In my attempt above to zoom out to include both islands of New Zealand (NZ) and get one year of data using the Last Year button choice from this site:

  • https://quakesearch.geonet.org.nz/ and hitting Search box gave the following URLs for downloading data. I used the DOWNLOAD button to get my own data in Outpur Format CSV as chosen earlier.

https://quakesearch.geonet.org.nz/csv?bbox=163.52051,-49.23912,182.19727,-32.36140&startdate=2017-06-01&enddate=2018-05-17T14:00:00 https://quakesearch.geonet.org.nz/csv?bbox=163.52051,-49.23912,182.19727,-32.36140&startdate=2017-5-17T13:00:00&enddate=2017-06-01

What should you do now?

Try to DOWNLOAD your own CSV data and store it in a file named my_earthquakes.csv (NOTE: rename the file when you download so you don't replace the file earthquakes.csv!) inside the folder named data that is inside the same directory that this notebook is in.

In [1]:
%%sh
# print working directory
pwd
/home/raazesh/all/git/scalable-data-science/_360-in-525/2018/04/jp
In [4]:
%%sh
ls # list contents of working directory
360-in-525-04_00.html
360-in-525-04_00.ipynb
360-in-525-04_00.md
360-in-525-04_01.html
360-in-525-04_01.ipynb
360-in-525-04_01.md
360-in-525-04_02.html
360-in-525-04_02.ipynb
360-in-525-04_02.md
360-in-525-04_03.html
360-in-525-04_03.ipynb
360-in-525-04_03.md
360-in-525-04_04.html
360-in-525-04_04.ipynb
360-in-525-04_04.md
360-in-525-04_05.html
360-in-525-04_05.ipynb
360-in-525-04_05.md
360-in-525-04_06.ipynb
data
images
mydir
sageMathIpynbArchive
In [2]:
%%sh
# after download you should have the following file in directory named data
ls data
earthquakes.csv
In [15]:
%%sh  
# first three lines
head -3 data/earthquakes.csv
publicid,eventtype,origintime,modificationtime,longitude, latitude, magnitude, depth,magnitudetype,depthtype,evaluationmethod,evaluationstatus,evaluationmode,earthmodel,usedphasecount,usedstationcount,magnitudestationcount,minimumdistance,azimuthalgap,originerror,magnitudeuncertainty
2018p368955,,2018-05-17T12:19:35.516Z,2018-05-17T12:21:54.953Z,178.4653957,-37.51944533,2.209351541,20.9375,M,,NonLinLoc,,automatic,nz3drx,12,12,6,0.1363924727,261.0977462,0.8209633086,0
2018p368878,,2018-05-17T11:38:24.646Z,2018-05-17T11:40:26.254Z,177.8775115,-37.46115663,2.155154561,58.4375,M,,NonLinLoc,,automatic,nz3drx,11,11,7,0.3083220739,232.7487132,0.842884174,0
In [14]:
%%sh 
# last three lines
tail -3 data/earthquakes.csv
2017p408155,earthquake,2017-06-01T00:25:06.491Z,2017-06-19T03:05:19.468Z,172.4586182,-42.75725555,2.265096095,11.19184685,M,,LOCSAT,confirmed,manual,iasp91,26,24,13,0.1402618885,61.97485352,0.6915929023,0
2017p408137,earthquake,2017-06-01T00:15:55.074Z,2017-06-19T03:02:21.311Z,176.7870483,-37.73429108,2.276142644,3.937263489,M,,LOCSAT,confirmed,manual,iasp91,28,23,15,0.155876264,96.37628555,0.5852246521,0
2017p408120,earthquake,2017-06-01T00:07:04.890Z,2017-06-01T07:20:23.994Z,175.4930025,-39.31558765,1.298107247,13.5546875,M,,NonLinLoc,confirmed,manual,nz3drx,28,19,13,0.04550182409,86.69529793,0.2189521352,0
In [13]:
%%sh  
# number of lines in the file; menmonic from `man wc` is wc = word-count option=-l is for lines
wc -l  data/earthquakes.csv
21017 data/earthquakes.csv

Let's analyse the measured earth quakes in data/earthquakes.csv

This will ensure we are all looking at the same file!

But feel free to play with your own data/my_earthquakes.csv on the side.

In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 

More on Statistics

Recall that a statistic is any measureable function of the data: $T(x): \mathbb{X} \rightarrow \mathbb{T}$.

Thus, a statistic $T$ is also an RV that takes values in the space $\mathbb{T}$.

When $x \in \mathbb{X}$ is the observed data, $T(x)=t$ is the observed statistic of the observed data $x$.

Example 1: New Zealand Lotto and 1114 IID $de Moirve$ RVs

Let's go back to our New Zealand lotto data.

We showed that for New Zealand lotto (40 balls in the machine, numbered $1, 2, \ldots, 40$), the number on the first ball out of the machine can be modelled as a de Moivre$(\frac{1}{40}, \frac{1}{40}, \ldots, \frac{1}{40})$ RV.

We have the New Zealand Lotto results for 1114 draws, from 1 August 1987 to 10 November 2008 (retrieved from the NZ lotto web site: http://lotto.nzpages.co.nz/previousresults.html ).

We can think of this data as $x$, the realisation of a random vector $X = (X_1, X_2,\ldots, X_{1114})$ where $X_1, X_2,\ldots, X_{1114} \overset{IID}{\thicksim} \text{de Moivre}(\frac{1}{40}, \frac{1}{40}, \ldots, \frac{1}{40})$

The data space is every possible sequence of ball numbers that we could have got in these 1114 draws. $\mathbb{X} = \{1, 2, \ldots, 40\}^{1114}$. There are $40^{1114}$ possible sequences and our data is just one of these $40^{1114}$ possible points in the data space.

We will use our hidden function that enables us to get the ball one data in a list. Evaluate the cell below to get the data and confirm that we have data for 1114 draws.

In [ ]: