{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"deletable": false
},
"source": [
"# [Applied Statistics](https://lamastex.github.io/scalable-data-science/as/2019/)\n",
"## 1MS926, Spring 2019, Uppsala University \n",
"©2019 Raazesh Sainudiin. [Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 06. Statistics from Data: Fetching New Zealand Earthquakes & Live Play with `data/`\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Live Data-fetch of NZ EQ Data\n",
"- More on Statistics\n",
"- Sample Mean\n",
"- Sample Variance\n",
"- Order Statistics\n",
"- Frequencies\n",
"- Empirical Mass Function\n",
"- Empirical Distribution Function\n",
"- List Comprehensions\n",
"- New Zealand Earthquakes\n",
"- Live Play with `data/`\n",
" - Swedish election data\n",
" - Biergartens in Germany\n",
" \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Live Data-fetching Exercise Now\n",
"\n",
"Go to [https://quakesearch.geonet.org.nz/](https://quakesearch.geonet.org.nz/) and download data on NZ earthquakes.\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In my attempt above to zoom out to include both islands of New Zealand (NZ) and get one year of data using the `Last Year` button choice from this site:\n",
" - [https://quakesearch.geonet.org.nz/](https://quakesearch.geonet.org.nz/)\n",
"and hitting `Search` box gave the following URLs for downloading data. I used the `DOWNLOAD` button to get my own data in Outpur Format `CSV` as chosen earlier.\n",
"\n",
"https://quakesearch.geonet.org.nz/csv?bbox=163.52051,-49.23912,182.19727,-32.36140&startdate=2017-06-01&enddate=2018-05-17T14:00:00\n",
"https://quakesearch.geonet.org.nz/csv?bbox=163.52051,-49.23912,182.19727,-32.36140&startdate=2017-5-17T13:00:00&enddate=2017-06-01\n",
"\n",
"## What should you do now?\n",
"\n",
"Try to `DOWNLOAD` your own `CSV` data and store it in a file named **`my_earthquakes.csv`** (NOTE: rename the file when you download so you don't replace the file `earthquakes.csv`!) inside the folder named **`data`** that is inside the same directory that this notebook is in."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/Users/raazsainudiin/all/git/private/admin/uu/courses/2019Spring_AppliedStats-STS/_as/master/jp\n"
]
}
],
"source": [
"%%sh\n",
"# print working directory\n",
"pwd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"00.ipynb\n",
"01.ipynb\n",
"02.ipynb\n",
"03.ipynb\n",
"04.ipynb\n",
"05.ipynb\n",
"06.ipynb\n",
"07.ipynb\n",
"08.ipynb\n",
"09.ipynb\n",
"10.ipynb\n",
"11.ipynb\n",
"13.ipynb\n",
"SimpleLinearRegression.ipynb\n",
"data\n",
"images\n",
"linear-regression-from-scratch.ipynb\n",
"myHist.png\n"
]
}
],
"source": [
"%%sh\n",
"ls # list contents of working directory"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NYPowerBall.csv\n",
"co2_mm_mlo.txt\n",
"earthquakes.csv\n",
"earthquakes.csv.zip\n",
"earthquakes.tgz\n",
"earthquakes_small.csv\n",
"final.csv\n",
"final.csv.zip\n",
"final.tgz\n",
"pride_and_prejudice.txt\n",
"rainfallInChristchurch.csv\n"
]
}
],
"source": [
"%%sh\n",
"# after download you should have the following file in directory named data\n",
"ls data"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"publicid,eventtype,origintime,modificationtime,longitude, latitude, magnitude, depth,magnitudetype,depthtype,evaluationmethod,evaluationstatus,evaluationmode,earthmodel,usedphasecount,usedstationcount,magnitudestationcount,minimumdistance,azimuthalgap,originerror,magnitudeuncertainty\n",
"2018p371534,,2018-05-18T11:13:48.826Z,2018-05-18T11:15:55.741Z,176.469659,-38.10063545,2.123583253,93.125,M,,NonLinLoc,,automatic,nz3drx,18,18,11,0.3996779802,94.08602902,1.036195008,0\n",
"2018p371524,,2018-05-18T11:08:07.588Z,2018-05-18T11:11:14.319Z,176.4213445,-38.63584892,2.570467678,35.9375,M,,NonLinLoc,,automatic,nz3drx,22,22,11,0.3208135882,89.12864378,1.012353739,0\n"
]
}
],
"source": [
"%%sh \n",
"# first three lines\n",
"head -3 data/earthquakes_small.csv"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2018p352775,,2018-05-11T12:38:54.732Z,2018-05-11T12:40:28.518Z,175.6063627,-40.81585537,1.835272336,13.671875,M,,NonLinLoc,,automatic,nz3drx,22,22,12,0.1097369199,84.14006379,0.3314536834,0\n",
"2018p352725,,2018-05-11T12:12:36.343Z,2018-05-11T12:14:42.372Z,176.0372811,-38.78743116,2.103529946,76.25,M,,NonLinLoc,,automatic,nz3drx,17,17,4,0.4257033383,244.4056741,1.445270768,0\n",
"2018p352684,,2018-05-11T11:50:06.019Z,2018-05-11T11:51:41.163Z,176.5437111,-40.07042442,1.503468463,13.671875,M,,NonLinLoc,,automatic,nz3drx,13,13,7,0.079302248,81.46123042,0.4485324555,0\n"
]
}
],
"source": [
"%%sh \n",
"# last three lines\n",
"tail -3 data/earthquakes_small.csv"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"411 data/earthquakes_small.csv\n"
]
}
],
"source": [
"%%sh \n",
"# number of lines in the file; menmonic from `man wc` is wc = word-count option=-l is for lines\n",
"wc -l data/earthquakes_small.csv"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"#%%sh\n",
"#man wc"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Let's analyse the measured earth quakes in `data/earthquakes.csv`\n",
"\n",
"This will ensure we are all looking at the same file!\n",
"\n",
"But feel free to play with your own `data/my_earthquakes.csv` on the side."
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"### Exercise:\n",
"Grab origin-time, lat, lon, magnitude, depth"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"with open(\"data/earthquakes_small.csv\") as f:\n",
" reader = f.read()\n",
" \n",
"dataList = reader.split('\\n')"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"412"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(dataList)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'publicid,eventtype,origintime,modificationtime,longitude, latitude, magnitude, depth,magnitudetype,depthtype,evaluationmethod,evaluationstatus,evaluationmode,earthmodel,usedphasecount,usedstationcount,magnitudestationcount,minimumdistance,azimuthalgap,originerror,magnitudeuncertainty'"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataList[0]"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"myDataAccumulatorList =[]\n",
"for data in dataList[1:-2]:\n",
" dataRow = data.split(',')\n",
" myData = [dataRow[4],dataRow[5],dataRow[6]]#,dataRow[7]]\n",
" myFloatData = tuple([float(x) for x in myData])\n",
" myDataAccumulatorList.append(myFloatData)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"