©2018 Raazesh Sainudiin. Attribution 4.0 International (CC BY 4.0)
Go to https://quakesearch.geonet.org.nz/ and download data on NZ earthquakes.
<img src = "images/GeoNetQuakeSearchDownloadCSV.png" width =800>
In my attempt above to zoom out to include both islands of New Zealand (NZ) and get one year of data using the Last Year
button choice from this site:
Search
box gave the following URLs for downloading data. I used the DOWNLOAD
button to get my own data in Outpur Format CSV
as chosen earlier.https://quakesearch.geonet.org.nz/csv?bbox=163.52051,-49.23912,182.19727,-32.36140&startdate=2017-06-01&enddate=2018-05-17T14:00:00 https://quakesearch.geonet.org.nz/csv?bbox=163.52051,-49.23912,182.19727,-32.36140&startdate=2017-5-17T13:00:00&enddate=2017-06-01
Try to DOWNLOAD
your own CSV
data and store it in a file named my_earthquakes.csv
(NOTE: rename the file when you download so you don't replace the file earthquakes.csv
!) inside the folder named data
that is inside the same directory that this notebook is in.
%%sh
# print working directory
pwd
%%sh
ls # list contents of working directory
%%sh
# after download you should have the following file in directory named data
ls data
%%sh
# first three lines
head -3 data/earthquakes.csv
%%sh
# last three lines
tail -3 data/earthquakes.csv
%%sh
# number of lines in the file; menmonic from `man wc` is wc = word-count option=-l is for lines
wc -l data/earthquakes.csv
%%sh
man wc
data/earthquakes.csv
¶This will ensure we are all looking at the same file!
But feel free to play with your own data/my_earthquakes.csv
on the side.
Grab origin-time, lat, lon, magnitude, depth
with open("data/earthquakes.csv") as f:
reader = f.read()
dataList = reader.split('\n')
len(dataList)
dataList[0]
myDataAccumulatorList =[]
for data in dataList[1:-2]:
dataRow = data.split(',')
myData = [dataRow[4],dataRow[5],dataRow[6]]#,dataRow[7]]
myFloatData = tuple([float(x) for x in myData])
myDataAccumulatorList.append(myFloatData)
points(myDataAccumulatorList)
Recall that a statistic is any measureable function of the data: $T(x): \mathbb{X} \rightarrow \mathbb{T}$.
Thus, a statistic $T$ is also an RV that takes values in the space $\mathbb{T}$.
When $x \in \mathbb{X}$ is the observed data, $T(x)=t$ is the observed statistic of the observed data $x$.
Let's go back to our New Zealand lotto data.
We showed that for New Zealand lotto (40 balls in the machine, numbered $1, 2, \ldots, 40$), the number on the first ball out of the machine can be modelled as a de Moivre$(\frac{1}{40}, \frac{1}{40}, \ldots, \frac{1}{40})$ RV.
We have the New Zealand Lotto results for 1114 draws, from 1 August 1987 to 10 November 2008 (retrieved from the NZ lotto web site: http://lotto.nzpages.co.nz/previousresults.html ).
We can think of this data as $x$, the realisation of a random vector $X = (X_1, X_2,\ldots, X_{1114})$ where $X_1, X_2,\ldots, X_{1114} \overset{IID}{\thicksim} \text{de Moivre}(\frac{1}{40}, \frac{1}{40}, \ldots, \frac{1}{40})$
The data space is every possible sequence of ball numbers that we could have got in these 1114 draws. $\mathbb{X} = \{1, 2, \ldots, 40\}^{1114}$. There are $40^{1114}$ possible sequences and our data is just one of these $40^{1114}$ possible points in the data space.
We will use our hidden function that enables us to get the ball one data in a list. Evaluate the cell below to get the data and confirm that we have data for 1114 draws.