Million Song Dataset - Kaggle Challenge
Predict which songs a user will listen to.
SOURCE: This is just a Scala-rification of the Python notebook published in databricks community edition in 2016.
Stage 2: Exploring songs data
This is the second notebook in this tutorial. In this notebook we do what any data scientist does with their data right after parsing it: exploring and understanding different aspect of data. Make sure you understand how we get the songsTable
by reading and running the ETL notebook. In the ETL notebook we created and cached a temporary table named songsTable
Let's Do all the main bits in Stage 1 now before doing Stage 2 in this Notebook.
// Let's quickly do everything to register the tempView of the table here
// fill in comment ... EXERCISE!
case class Song(artist_id: String, artist_latitude: Double, artist_longitude: Double, artist_location: String, artist_name: String, duration: Double, end_of_fade_in: Double, key: Int, key_confidence: Double, loudness: Double, release: String, song_hotness: Double, song_id: String, start_of_fade_out: Double, tempo: Double, time_signature: Double, time_signature_confidence: Double, title: String, year: Double, partial_sequence: Int)
def parseLine(line: String): Song = {
// fill in comment ...
def toDouble(value: String, defaultVal: Double): Double = {
try {
value.toDouble
} catch {
case e: Exception => defaultVal
}
}
def toInt(value: String, defaultVal: Int): Int = {
try {
value.toInt
} catch {
case e: Exception => defaultVal
}
}
// fill in comment ...
val tokens = line.split("\t")
Song(tokens(0), toDouble(tokens(1), 0.0), toDouble(tokens(2), 0.0), tokens(3), tokens(4), toDouble(tokens(5), 0.0), toDouble(tokens(6), 0.0), toInt(tokens(7), -1), toDouble(tokens(8), 0.0), toDouble(tokens(9), 0.0), tokens(10), toDouble(tokens(11), 0.0), tokens(12), toDouble(tokens(13), 0.0), toDouble(tokens(14), 0.0), toDouble(tokens(15), 0.0), toDouble(tokens(16), 0.0), tokens(17), toDouble(tokens(18), 0.0), toInt(tokens(19), -1))
}
// this is loads all the data - a subset of the 1M songs dataset
val dataRDD = sc.textFile("/datasets/sds/songs/data-001/part-*")
// .. fill in comment
val df = dataRDD.map(parseLine).toDF
// .. fill in comment
df.createOrReplaceTempView("songsTable")
defined class Song
parseLine: (line: String)Song
dataRDD: org.apache.spark.rdd.RDD[String] = /datasets/sds/songs/data-001/part-* MapPartitionsRDD[236] at textFile at command-2971213210276755:30
df: org.apache.spark.sql.DataFrame = [artist_id: string, artist_latitude: double ... 18 more fields]
spark.catalog.listTables.show(false) // make sure the temp view of our table is there
+----------------------------+--------+-----------+---------+-----------+
|name |database|description|tableType|isTemporary|
+----------------------------+--------+-----------+---------+-----------+
|all_prices |default |null |MANAGED |false |
|bitcoin_normed_window |default |null |MANAGED |false |
|bitcoin_reversals_window |default |null |MANAGED |false |
|countrycodes |default |null |EXTERNAL |false |
|gold_normed_window |default |null |MANAGED |false |
|gold_reversals_window |default |null |MANAGED |false |
|ltcar_locations_2_csv |default |null |MANAGED |false |
|magellan |default |null |MANAGED |false |
|mobile_sample |default |null |EXTERNAL |false |
|oil_normed_window |default |null |MANAGED |false |
|oil_reversals_window |default |null |MANAGED |false |
|oil_reversals_window2 |default |null |MANAGED |false |
|over300all_2_txt |default |null |MANAGED |false |
|person |default |null |MANAGED |false |
|personer |default |null |MANAGED |false |
|persons |default |null |MANAGED |false |
|simple_range |default |null |MANAGED |false |
|social_media_usage |default |null |MANAGED |false |
|social_media_usage_csv_gui |default |null |MANAGED |false |
|voronoi20191213uppsla1st_txt|default |null |MANAGED |false |
+----------------------------+--------+-----------+---------+-----------+
only showing top 20 rows
A first inspection
A first step to any data exploration is viewing sample data. For this purpose we can use a simple SQL query that returns first 10 rows.
select * from songsTable limit 10
artist_id | artist_latitude | artist_longitude | artist_location | artist_name | duration | end_of_fade_in | key | key_confidence | loudness | release | song_hotness | song_id | start_of_fade_out | tempo | time_signature | time_signature_confidence | title | year | partial_sequence |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AR81V6H1187FB48872 | 0.0 | 0.0 | Earl Sixteen | 213.7073 | 0.0 | 11.0 | 0.419 | -12.106 | Soldier of Jah Army | 0.0 | SOVNZSZ12AB018A9B8 | 208.289 | 125.882 | 1.0 | 0.0 | Rastaman | 2003.0 | -1.0 | |
ARVVZQP11E2835DBCB | 0.0 | 0.0 | Wavves | 133.25016 | 0.0 | 0.0 | 0.282 | 0.596 | Wavvves | 0.471578247701 | SOJTQHQ12A8C143C5F | 128.116 | 89.519 | 1.0 | 0.0 | I Want To See You (And Go To The Movies) | 2009.0 | -1.0 | |
ARFG9M11187FB3BBCB | 0.0 | 0.0 | Nashua USA | C-Side | 247.32689 | 0.0 | 9.0 | 0.612 | -4.896 | Santa Festival Compilation 2008 vol.1 | 0.0 | SOAJSQL12AB0180501 | 242.196 | 171.278 | 5.0 | 1.0 | Loose on the Dancefloor | 0.0 | 225261.0 |
ARK4Z2O1187FB45FF0 | 0.0 | 0.0 | Harvest | 337.05751 | 0.247 | 4.0 | 0.46 | -9.092 | Underground Community | 0.0 | SOTDRVW12AB018BEB9 | 327.436 | 84.986 | 4.0 | 0.673 | No Return | 0.0 | 101619.0 | |
AR4VQSG1187FB57E18 | 35.25082 | -91.74015 | Searcy, AR | Gossip | 430.23628 | 0.0 | 2.0 | 3.4e-2 | -6.846 | Yr Mangled Heart | 0.0 | SOTVOCL12A8AE478DD | 424.06 | 121.998 | 4.0 | 0.847 | Yr Mangled Heart | 2006.0 | 740623.0 |
ARNBV1X1187B996249 | 0.0 | 0.0 | Alex | 186.80118 | 0.0 | 4.0 | 0.641 | -16.108 | Jolgaledin | 0.0 | SODTGRY12AB0182438 | 166.156 | 140.735 | 4.0 | 5.5e-2 | Mariu Sonur Jesus | 0.0 | 673970.0 | |
ARXOEZX1187B9B82A1 | 0.0 | 0.0 | Elie Attieh | 361.89995 | 0.0 | 7.0 | 0.863 | -4.919 | ELITE | 0.0 | SOIINTJ12AB0180BA6 | 354.476 | 128.024 | 4.0 | 0.399 | Fe Yom We Leila | 0.0 | 280304.0 | |
ARXPUIA1187B9A32F1 | 0.0 | 0.0 | Rome, Italy | Simone Cristicchi | 220.00281 | 2.119 | 4.0 | 0.486 | -6.52 | Dall'Altra Parte Del Cancello | 0.484225272411 | SONHXJK12AAF3B5290 | 214.761 | 99.954 | 1.0 | 0.928 | L'Italiano | 2007.0 | 745962.0 |
ARNPPTH1187B9AD429 | 51.4855 | -0.37196 | Heston, Middlesex, England | Jimmy Page | 156.86485 | 0.334 | 7.0 | 0.493 | -9.962 | No Introduction Necessary [Deluxe Edition] | 0.0 | SOGUHGW12A58A80E06 | 149.269 | 162.48 | 4.0 | 0.534 | Wailing Sounds | 2004.0 | 599250.0 |
AROGWRA122988FEE45 | 0.0 | 0.0 | Christos Dantis | 256.67873 | 2.537 | 9.0 | 0.742 | -13.404 | Daktilika Apotipomata | 0.0 | SOJJOYI12A8C13399D | 248.912 | 134.944 | 4.0 | 0.162 | Stin Proigoumeni Zoi | 0.0 | 611396.0 |
table("songsTable").printSchema()
root
|-- artist_id: string (nullable = true)
|-- artist_latitude: double (nullable = false)
|-- artist_longitude: double (nullable = false)
|-- artist_location: string (nullable = true)
|-- artist_name: string (nullable = true)
|-- duration: double (nullable = false)
|-- end_of_fade_in: double (nullable = false)
|-- key: integer (nullable = false)
|-- key_confidence: double (nullable = false)
|-- loudness: double (nullable = false)
|-- release: string (nullable = true)
|-- song_hotness: double (nullable = false)
|-- song_id: string (nullable = true)
|-- start_of_fade_out: double (nullable = false)
|-- tempo: double (nullable = false)
|-- time_signature: double (nullable = false)
|-- time_signature_confidence: double (nullable = false)
|-- title: string (nullable = true)
|-- year: double (nullable = false)
|-- partial_sequence: integer (nullable = false)
select count(*) from songsTable
count(1) |
---|
31369.0 |
table("songsTable").count() // or equivalently with DataFrame API - recall table("songsTable") is a DataFrame
res4: Long = 31369
display(sqlContext.sql("SELECT duration, year FROM songsTable")) // Aggregation is set to 'Average' in 'Plot Options'
duration | year |
---|---|
213.7073 | 2003.0 |
133.25016 | 2009.0 |
247.32689 | 0.0 |
337.05751 | 0.0 |
430.23628 | 2006.0 |
186.80118 | 0.0 |
361.89995 | 0.0 |
220.00281 | 2007.0 |
156.86485 | 2004.0 |
256.67873 | 0.0 |
204.64281 | 0.0 |
112.48281 | 0.0 |
170.39628 | 0.0 |
215.95383 | 0.0 |
303.62077 | 0.0 |
266.60526 | 0.0 |
326.19057 | 1997.0 |
51.04281 | 2009.0 |
129.4624 | 0.0 |
253.33506 | 2003.0 |
237.76608 | 2004.0 |
132.96281 | 0.0 |
399.35955 | 2006.0 |
168.75057 | 1991.0 |
396.042 | 0.0 |
192.10404 | 1968.0 |
12.2771 | 2006.0 |
367.56853 | 0.0 |
189.93587 | 0.0 |
233.50812 | 0.0 |
462.68036 | 0.0 |
202.60526 | 0.0 |
241.52771 | 0.0 |
275.64363 | 1992.0 |
350.69342 | 2007.0 |
166.55628 | 1968.0 |
249.49506 | 1983.0 |
53.86404 | 1992.0 |
233.76934 | 2001.0 |
275.12118 | 2009.0 |
191.13751 | 2006.0 |
299.07546 | 0.0 |
468.74077 | 0.0 |
110.34077 | 0.0 |
234.78812 | 2003.0 |
705.25342 | 2006.0 |
383.52934 | 0.0 |
196.10077 | 0.0 |
299.20608 | 1998.0 |
94.04036 | 0.0 |
28.08118 | 2006.0 |
207.93424 | 2006.0 |
152.0322 | 1999.0 |
207.96036 | 2002.0 |
371.25179 | 0.0 |
288.93995 | 2002.0 |
235.93751 | 2004.0 |
505.70404 | 0.0 |
177.57995 | 0.0 |
376.842 | 2004.0 |
266.84036 | 2004.0 |
270.8371 | 2006.0 |
178.18077 | 0.0 |
527.17669 | 0.0 |
244.27057 | 0.0 |
436.47955 | 2006.0 |
236.79955 | 0.0 |
134.53016 | 2005.0 |
181.002 | 0.0 |
239.41179 | 1999.0 |
72.98567 | 0.0 |
214.36036 | 2001.0 |
150.59546 | 2007.0 |
152.45016 | 1970.0 |
218.17424 | 0.0 |
290.63791 | 0.0 |
149.05424 | 0.0 |
440.21506 | 0.0 |
212.34893 | 1988.0 |
278.67383 | 0.0 |
269.60934 | 1974.0 |
182.69995 | 2002.0 |
207.882 | 2007.0 |
102.50404 | 0.0 |
437.60281 | 0.0 |
216.11057 | 2009.0 |
193.25342 | 0.0 |
234.16118 | 2009.0 |
695.77098 | 0.0 |
297.58649 | 1996.0 |
265.37751 | 2000.0 |
182.85669 | 1990.0 |
202.23955 | 0.0 |
390.08608 | 2009.0 |
242.78159 | 2000.0 |
242.54649 | 2002.0 |
496.66567 | 2004.0 |
395.36281 | 0.0 |
234.89261 | 1999.0 |
237.84444 | 2005.0 |
313.57342 | 2009.0 |
489.22077 | 2001.0 |
239.98649 | 2004.0 |
128.65261 | 0.0 |
193.07057 | 0.0 |
144.19546 | 0.0 |
196.96281 | 2006.0 |
222.06649 | 1997.0 |
58.38322 | 0.0 |
346.14812 | 1998.0 |
406.54322 | 0.0 |
304.09098 | 2009.0 |
180.21832 | 2003.0 |
213.41995 | 0.0 |
323.44771 | 0.0 |
54.7522 | 2009.0 |
437.02812 | 1994.0 |
268.7473 | 2009.0 |
104.75057 | 0.0 |
248.60689 | 2006.0 |
221.41342 | 0.0 |
237.81832 | 1991.0 |
216.34567 | 2009.0 |
78.94159 | 0.0 |
47.22893 | 2005.0 |
202.00444 | 2007.0 |
293.56363 | 0.0 |
206.44526 | 1986.0 |
267.78077 | 2003.0 |
187.27138 | 2008.0 |
249.05098 | 2009.0 |
221.51791 | 0.0 |
452.88444 | 0.0 |
163.76118 | 1992.0 |
257.17506 | 0.0 |
235.78077 | 0.0 |
257.82812 | 1996.0 |
195.34322 | 0.0 |
478.1971 | 0.0 |
268.01587 | 1997.0 |
136.93342 | 1983.0 |
397.53098 | 0.0 |
194.69016 | 2001.0 |
580.80608 | 0.0 |
177.71057 | 2006.0 |
257.43628 | 1999.0 |
184.13669 | 0.0 |
64.57424 | 2001.0 |
123.92444 | 1993.0 |
257.07057 | 0.0 |
219.48036 | 1996.0 |
679.41832 | 0.0 |
252.29016 | 1995.0 |
311.90159 | 2004.0 |
252.76036 | 1998.0 |
138.94485 | 0.0 |
428.64281 | 0.0 |
295.31383 | 0.0 |
212.03546 | 0.0 |
426.50077 | 0.0 |
197.11955 | 0.0 |
191.55546 | 0.0 |
187.53261 | 2006.0 |
184.97261 | 2004.0 |
388.41424 | 2009.0 |
218.90567 | 0.0 |
246.49098 | 0.0 |
452.88444 | 0.0 |
223.18975 | 0.0 |
245.2371 | 0.0 |
148.92363 | 0.0 |
362.81424 | 2005.0 |
171.44118 | 0.0 |
207.72526 | 2005.0 |
191.29424 | 0.0 |
208.50893 | 0.0 |
240.24771 | 1995.0 |
373.44608 | 2002.0 |
172.01587 | 0.0 |
153.25995 | 2007.0 |
242.36363 | 1994.0 |
177.55383 | 0.0 |
263.20934 | 1994.0 |
191.03302 | 2007.0 |
232.77669 | 0.0 |
220.65587 | 0.0 |
132.57098 | 2002.0 |
189.6224 | 1993.0 |
32.522 | 1997.0 |
173.94893 | 0.0 |
268.01587 | 2006.0 |
91.97669 | 0.0 |
215.77098 | 0.0 |
195.47383 | 0.0 |
234.81424 | 1977.0 |
110.78485 | 0.0 |
155.74159 | 0.0 |
172.5122 | 0.0 |
227.76118 | 1995.0 |
233.01179 | 2007.0 |
298.89261 | 0.0 |
245.36771 | 1994.0 |
276.08771 | 2005.0 |
375.77098 | 2003.0 |
273.71057 | 0.0 |
226.92526 | 0.0 |
196.46649 | 0.0 |
199.65342 | 1995.0 |
243.40853 | 0.0 |
207.62077 | 2006.0 |
252.73424 | 0.0 |
244.32281 | 0.0 |
152.65914 | 0.0 |
203.88526 | 2003.0 |
120.16281 | 0.0 |
214.77832 | 1977.0 |
204.9824 | 0.0 |
118.30812 | 1996.0 |
205.26975 | 0.0 |
499.22567 | 0.0 |
217.83465 | 2005.0 |
192.57424 | 2005.0 |
328.09751 | 0.0 |
298.03057 | 1968.0 |
501.49832 | 0.0 |
276.40118 | 0.0 |
507.55873 | 2006.0 |
191.08526 | 2008.0 |
324.38812 | 0.0 |
218.56608 | 0.0 |
232.30649 | 0.0 |
295.05261 | 1972.0 |
225.74975 | 2003.0 |
522.00444 | 0.0 |
245.86404 | 1967.0 |
263.67955 | 0.0 |
556.61669 | 2009.0 |
227.94404 | 1998.0 |
83.82649 | 1964.0 |
242.85995 | 0.0 |
233.09016 | 2008.0 |
201.74322 | 0.0 |
476.15955 | 0.0 |
370.93832 | 2005.0 |
229.17179 | 0.0 |
288.07791 | 2001.0 |
91.34975 | 0.0 |
230.79138 | 2005.0 |
256.46975 | 2003.0 |
203.44118 | 0.0 |
230.81751 | 2003.0 |
272.29995 | 0.0 |
201.22077 | 2008.0 |
204.93016 | 2010.0 |
372.84526 | 0.0 |
63.65995 | 2005.0 |
412.15955 | 0.0 |
270.10567 | 0.0 |
104.6722 | 0.0 |
214.25587 | 1970.0 |
230.05995 | 0.0 |
155.74159 | 0.0 |
218.04363 | 2008.0 |
357.77261 | 2007.0 |
318.27546 | 1985.0 |
444.55138 | 2010.0 |
509.07383 | 0.0 |
176.95302 | 0.0 |
95.34649 | 0.0 |
207.67302 | 0.0 |
256.67873 | 1994.0 |
252.78649 | 0.0 |
234.60526 | 0.0 |
167.65342 | 0.0 |
266.16118 | 0.0 |
188.05506 | 0.0 |
229.14567 | 2009.0 |
227.00363 | 2004.0 |
74.50077 | 1992.0 |
222.09261 | 0.0 |
212.68853 | 1984.0 |
155.74159 | 0.0 |
153.65179 | 0.0 |
548.51873 | 0.0 |
445.90975 | 2003.0 |
317.49179 | 1999.0 |
140.32934 | 0.0 |
309.4722 | 0.0 |
142.91546 | 0.0 |
429.24363 | 2007.0 |
172.19873 | 0.0 |
215.562 | 0.0 |
290.79465 | 2009.0 |
197.04118 | 0.0 |
309.44608 | 0.0 |
265.01179 | 1999.0 |
257.64526 | 2000.0 |
203.54567 | 0.0 |
161.56689 | 0.0 |
177.84118 | 0.0 |
260.04853 | 2004.0 |
195.00363 | 1988.0 |
268.042 | 0.0 |
195.97016 | 1991.0 |
351.92118 | 0.0 |
119.35302 | 0.0 |
177.24036 | 0.0 |
259.83955 | 0.0 |
222.51057 | 2008.0 |
163.97016 | 2004.0 |
139.49342 | 0.0 |
158.77179 | 0.0 |
193.4624 | 2000.0 |
131.082 | 1963.0 |
190.95465 | 1998.0 |
413.3873 | 2005.0 |
134.73914 | 1966.0 |
162.40281 | 1965.0 |
243.59138 | 1965.0 |
180.84526 | 0.0 |
315.14077 | 0.0 |
221.51791 | 1994.0 |
122.53995 | 2008.0 |
243.43465 | 1990.0 |
200.202 | 1982.0 |
95.50322 | 2000.0 |
200.4371 | 1998.0 |
186.93179 | 0.0 |
492.22485 | 1999.0 |
359.33995 | 1972.0 |
89.39057 | 1990.0 |
212.81914 | 0.0 |
315.03628 | 1996.0 |
214.69995 | 0.0 |
137.92608 | 1993.0 |
559.49016 | 0.0 |
382.14485 | 1991.0 |
430.31465 | 2008.0 |
171.25832 | 0.0 |
210.12853 | 2002.0 |
53.18485 | 2005.0 |
78.65424 | 1993.0 |
209.162 | 2008.0 |
237.60934 | 2006.0 |
184.47628 | 2009.0 |
323.02975 | 1997.0 |
158.27546 | 0.0 |
213.86404 | 0.0 |
470.69995 | 0.0 |
229.79873 | 2005.0 |
392.22812 | 0.0 |
196.62322 | 0.0 |
80.97914 | 0.0 |
124.55138 | 1989.0 |
230.32118 | 1971.0 |
132.51873 | 0.0 |
112.95302 | 1994.0 |
131.52608 | 0.0 |
153.25995 | 2010.0 |
211.01669 | 0.0 |
218.93179 | 2008.0 |
175.0722 | 2010.0 |
116.61016 | 1997.0 |
251.45424 | 2001.0 |
269.50485 | 2004.0 |
231.47057 | 0.0 |
298.37016 | 1996.0 |
314.122 | 2005.0 |
263.99302 | 0.0 |
480.91383 | 2001.0 |
305.10975 | 0.0 |
280.16281 | 0.0 |
295.65342 | 1999.0 |
411.45424 | 2007.0 |
265.97832 | 0.0 |
153.96526 | 0.0 |
210.31138 | 1970.0 |
241.44934 | 0.0 |
235.33669 | 0.0 |
352.65261 | 0.0 |
293.35465 | 0.0 |
243.66975 | 2003.0 |
133.22404 | 0.0 |
233.03791 | 0.0 |
339.93098 | 0.0 |
249.80853 | 1993.0 |
253.72689 | 2004.0 |
94.35383 | 1981.0 |
130.63791 | 0.0 |
195.36934 | 0.0 |
229.25016 | 2007.0 |
314.64444 | 2007.0 |
329.1424 | 1998.0 |
224.46975 | 1990.0 |
215.562 | 1987.0 |
236.85179 | 1990.0 |
197.11955 | 1957.0 |
251.76771 | 2004.0 |
183.50975 | 0.0 |
268.01587 | 2005.0 |
413.02159 | 0.0 |
385.17506 | 2000.0 |
358.16444 | 0.0 |
164.77995 | 0.0 |
253.36118 | 2004.0 |
196.49261 | 2007.0 |
157.6224 | 1999.0 |
310.93506 | 0.0 |
434.96444 | 1991.0 |
157.04771 | 1991.0 |
266.16118 | 2007.0 |
267.59791 | 1977.0 |
303.90812 | 0.0 |
277.18485 | 2009.0 |
272.22159 | 0.0 |
155.95057 | 0.0 |
127.00689 | 1997.0 |
152.86812 | 2005.0 |
224.7571 | 1990.0 |
175.41179 | 0.0 |
151.97995 | 0.0 |
199.99302 | 0.0 |
251.53261 | 0.0 |
252.96934 | 2004.0 |
181.13261 | 1984.0 |
195.49995 | 0.0 |
328.202 | 2001.0 |
187.71546 | 0.0 |
166.94812 | 1985.0 |
242.72934 | 1988.0 |
218.80118 | 2005.0 |
205.68771 | 0.0 |
146.93832 | 1996.0 |
449.4624 | 2000.0 |
503.40526 | 0.0 |
181.34159 | 0.0 |
143.90812 | 0.0 |
406.36036 | 0.0 |
269.87057 | 0.0 |
265.29914 | 0.0 |
242.88608 | 0.0 |
110.39302 | 0.0 |
262.84363 | 0.0 |
334.00118 | 1990.0 |
173.81832 | 2007.0 |
608.78322 | 0.0 |
197.22404 | 0.0 |
163.94404 | 2008.0 |
93.09995 | 2001.0 |
206.75873 | 0.0 |
183.50975 | 0.0 |
402.442 | 0.0 |
735.79057 | 1986.0 |
233.19465 | 1997.0 |
326.55628 | 0.0 |
525.50485 | 0.0 |
396.19873 | 0.0 |
171.12771 | 0.0 |
318.1971 | 2006.0 |
323.70893 | 2002.0 |
526.99383 | 0.0 |
161.09669 | 1991.0 |
168.41098 | 1990.0 |
249.57342 | 0.0 |
405.4722 | 0.0 |
271.0722 | 2010.0 |
190.69342 | 2009.0 |
151.61424 | 2001.0 |
121.57342 | 0.0 |
117.08036 | 0.0 |
244.24444 | 2008.0 |
246.85669 | 0.0 |
144.03873 | 2007.0 |
169.79546 | 1988.0 |
193.93261 | 2004.0 |
325.77261 | 0.0 |
337.34485 | 0.0 |
143.67302 | 2009.0 |
211.69587 | 0.0 |
299.4673 | 1978.0 |
159.76444 | 0.0 |
337.31873 | 0.0 |
259.18649 | 2007.0 |
221.64853 | 0.0 |
164.54485 | 0.0 |
56.34567 | 0.0 |
184.21506 | 0.0 |
249.23383 | 2010.0 |
127.29424 | 1994.0 |
306.6771 | 1980.0 |
168.98567 | 0.0 |
290.2722 | 0.0 |
182.33424 | 2004.0 |
180.92363 | 0.0 |
233.76934 | 1990.0 |
423.70567 | 0.0 |
139.36281 | 0.0 |
289.72363 | 2005.0 |
100.96281 | 2005.0 |
153.05098 | 2009.0 |
129.25342 | 0.0 |
190.11873 | 1993.0 |
158.1971 | 0.0 |
234.94485 | 2000.0 |
256.02567 | 0.0 |
279.84934 | 0.0 |
217.7824 | 2005.0 |
271.62077 | 2005.0 |
372.34893 | 0.0 |
264.88118 | 0.0 |
270.18404 | 1984.0 |
42.86649 | 0.0 |
247.27465 | 0.0 |
185.10322 | 1990.0 |
333.94893 | 0.0 |
380.49914 | 1999.0 |
517.72036 | 0.0 |
208.95302 | 2006.0 |
359.73179 | 0.0 |
378.72281 | 1995.0 |
110.41914 | 0.0 |
237.37424 | 2003.0 |
136.30649 | 0.0 |
153.73016 | 2005.0 |
209.8673 | 2007.0 |
224.86159 | 0.0 |
202.34404 | 0.0 |
229.43302 | 0.0 |
300.56444 | 2003.0 |
264.35873 | 0.0 |
213.9424 | 0.0 |
164.77995 | 2004.0 |
206.75873 | 0.0 |
249.73016 | 2009.0 |
521.11628 | 2002.0 |
240.09098 | 0.0 |
347.89832 | 0.0 |
224.96608 | 1993.0 |
250.25261 | 0.0 |
419.00363 | 0.0 |
593.3971 | 1958.0 |
269.89669 | 1999.0 |
235.12771 | 2009.0 |
180.76689 | 0.0 |
304.03873 | 2004.0 |
253.36118 | 2006.0 |
311.74485 | 2006.0 |
353.43628 | 0.0 |
337.00526 | 0.0 |
305.00526 | 2006.0 |
113.76281 | 2007.0 |
379.74159 | 0.0 |
258.76853 | 1993.0 |
157.64853 | 0.0 |
352.28689 | 0.0 |
221.51791 | 0.0 |
249.44281 | 0.0 |
205.42649 | 0.0 |
166.922 | 0.0 |
250.25261 | 0.0 |
224.73098 | 2003.0 |
316.83873 | 2002.0 |
269.34812 | 2007.0 |
188.02893 | 0.0 |
276.87138 | 2001.0 |
263.02649 | 0.0 |
320.44363 | 0.0 |
531.43465 | 2005.0 |
126.85016 | 2008.0 |
232.01914 | 0.0 |
243.87873 | 0.0 |
288.60036 | 2004.0 |
817.57995 | 2007.0 |
200.9073 | 0.0 |
229.48526 | 2009.0 |
263.65342 | 1971.0 |
209.71057 | 2008.0 |
430.54975 | 2007.0 |
531.9571 | 0.0 |
277.39383 | 0.0 |
253.41342 | 1999.0 |
538.5922 | 0.0 |
187.34975 | 0.0 |
189.67465 | 2006.0 |
247.66649 | 0.0 |
196.15302 | 2008.0 |
248.45016 | 0.0 |
266.26567 | 2005.0 |
174.41914 | 0.0 |
241.21424 | 1996.0 |
213.39383 | 0.0 |
201.66485 | 1956.0 |
141.16526 | 0.0 |
198.76526 | 2010.0 |
234.03057 | 2002.0 |
293.77261 | 0.0 |
149.83791 | 0.0 |
193.09669 | 0.0 |
416.62649 | 2007.0 |
206.18404 | 2008.0 |
292.15302 | 0.0 |
209.55383 | 1997.0 |
303.46404 | 0.0 |
284.31628 | 0.0 |
209.34485 | 0.0 |
131.34322 | 2010.0 |
127.16363 | 0.0 |
228.98893 | 1983.0 |
18.18077 | 0.0 |
202.762 | 1999.0 |
475.21914 | 1989.0 |
434.52036 | 2002.0 |
306.36363 | 0.0 |
251.84608 | 2007.0 |
392.80281 | 1999.0 |
191.63383 | 0.0 |
207.90812 | 0.0 |
298.86649 | 0.0 |
195.36934 | 0.0 |
236.06812 | 1995.0 |
315.76771 | 2009.0 |
214.5171 | 0.0 |
140.90404 | 0.0 |
147.66975 | 0.0 |
230.50404 | 0.0 |
259.99628 | 2010.0 |
234.70975 | 1994.0 |
191.97342 | 1992.0 |
305.6322 | 0.0 |
197.53751 | 1997.0 |
152.05832 | 0.0 |
360.82893 | 1998.0 |
440.37179 | 0.0 |
211.09506 | 2009.0 |
362.60526 | 1998.0 |
364.64281 | 1997.0 |
267.12771 | 0.0 |
380.81261 | 2007.0 |
248.13669 | 1995.0 |
253.20444 | 0.0 |
244.03546 | 0.0 |
159.13751 | 0.0 |
246.12526 | 0.0 |
40.95955 | 2005.0 |
200.04526 | 2007.0 |
155.08853 | 0.0 |
144.66567 | 0.0 |
170.86649 | 0.0 |
286.71955 | 2001.0 |
333.19138 | 1996.0 |
542.1971 | 0.0 |
222.37995 | 0.0 |
195.68281 | 2003.0 |
440.00608 | 0.0 |
223.08526 | 0.0 |
378.98404 | 0.0 |
91.45424 | 1983.0 |
114.65098 | 2009.0 |
218.80118 | 0.0 |
242.36363 | 0.0 |
143.0722 | 1962.0 |
242.78159 | 2007.0 |
256.31302 | 0.0 |
244.37506 | 0.0 |
36.54485 | 2007.0 |
401.94567 | 1999.0 |
178.65098 | 2003.0 |
277.002 | 2009.0 |
288.70485 | 2002.0 |
228.91057 | 2006.0 |
204.06812 | 0.0 |
212.40118 | 0.0 |
224.31302 | 2008.0 |
195.7873 | 1985.0 |
244.63628 | 2005.0 |
241.81506 | 0.0 |
224.10404 | 2001.0 |
132.75383 | 2008.0 |
113.3971 | 0.0 |
237.03465 | 0.0 |
162.58567 | 1987.0 |
247.24853 | 2008.0 |
285.30893 | 0.0 |
318.24934 | 0.0 |
375.53587 | 2007.0 |
188.78649 | 0.0 |
108.79955 | 0.0 |
270.91546 | 0.0 |
249.23383 | 0.0 |
192.80934 | 1984.0 |
295.20934 | 0.0 |
177.84118 | 2006.0 |
242.6771 | 0.0 |
245.28934 | 1999.0 |
105.61261 | 0.0 |
329.29914 | 0.0 |
207.46404 | 0.0 |
225.51465 | 2007.0 |
123.8722 | 0.0 |
270.10567 | 2008.0 |
174.86322 | 0.0 |
377.28608 | 0.0 |
220.18567 | 2005.0 |
1190.53016 | 0.0 |
1518.65424 | 0.0 |
438.64771 | 2008.0 |
344.842 | 2001.0 |
76.48608 | 1994.0 |
174.52363 | 2002.0 |
581.14567 | 0.0 |
177.68444 | 0.0 |
125.962 | 0.0 |
160.39138 | 2006.0 |
211.27791 | 0.0 |
182.88281 | 0.0 |
261.53751 | 2005.0 |
285.80526 | 0.0 |
263.44444 | 1991.0 |
133.32853 | 1998.0 |
313.99138 | 1990.0 |
199.18322 | 0.0 |
200.98567 | 0.0 |
170.84036 | 2009.0 |
194.48118 | 0.0 |
241.65832 | 0.0 |
245.15873 | 1970.0 |
262.66077 | 2002.0 |
307.46077 | 1999.0 |
295.20934 | 0.0 |
259.52608 | 0.0 |
347.19302 | 0.0 |
206.91546 | 0.0 |
399.51628 | 2008.0 |
271.25506 | 0.0 |
172.7473 | 1991.0 |
231.65342 | 1993.0 |
208.1171 | 1991.0 |
195.76118 | 1983.0 |
723.27791 | 1970.0 |
282.95791 | 0.0 |
153.12934 | 0.0 |
207.15057 | 0.0 |
174.41914 | 0.0 |
269.29587 | 2005.0 |
275.3824 | 0.0 |
149.41995 | 0.0 |
108.35546 | 1963.0 |
243.69587 | 0.0 |
308.27057 | 1996.0 |
204.90404 | 2007.0 |
311.24853 | 2001.0 |
164.77995 | 0.0 |
449.51465 | 0.0 |
140.93016 | 0.0 |
165.22404 | 2005.0 |
53.26322 | 2007.0 |
218.80118 | 2005.0 |
300.85179 | 1991.0 |
388.75383 | 2007.0 |
150.77832 | 1970.0 |
293.11955 | 0.0 |
177.71057 | 0.0 |
184.11057 | 2005.0 |
225.17506 | 2009.0 |
272.19546 | 2002.0 |
157.67465 | 0.0 |
204.61669 | 0.0 |
93.98812 | 0.0 |
204.45995 | 0.0 |
307.1473 | 0.0 |
347.0624 | 1970.0 |
184.73751 | 2005.0 |
146.65098 | 0.0 |
513.90649 | 2001.0 |
293.85098 | 1970.0 |
121.73016 | 2003.0 |
86.72608 | 0.0 |
171.25832 | 2007.0 |
264.95955 | 2007.0 |
411.68934 | 0.0 |
190.79791 | 1971.0 |
159.65995 | 0.0 |
162.89914 | 0.0 |
205.97506 | 2006.0 |
204.59057 | 0.0 |
117.02812 | 0.0 |
135.28771 | 0.0 |
163.65669 | 0.0 |
254.95465 | 0.0 |
178.31138 | 2001.0 |
150.77832 | 2001.0 |
410.53995 | 2001.0 |
222.30159 | 0.0 |
314.74893 | 0.0 |
233.11628 | 0.0 |
226.21995 | 0.0 |
441.67791 | 0.0 |
120.99873 | 2009.0 |
157.75302 | 0.0 |
203.65016 | 0.0 |
287.73832 | 0.0 |
226.7424 | 1997.0 |
69.56363 | 1993.0 |
174.52363 | 2007.0 |
363.67628 | 0.0 |
136.48934 | 0.0 |
390.60853 | 0.0 |
284.60363 | 0.0 |
291.81342 | 1990.0 |
502.7522 | 0.0 |
197.27628 | 0.0 |
329.53424 | 2009.0 |
340.1922 | 2003.0 |
170.94485 | 0.0 |
113.57995 | 0.0 |
205.24363 | 2009.0 |
169.22077 | 1994.0 |
285.70077 | 1980.0 |
221.23057 | 0.0 |
310.38649 | 0.0 |
353.48853 | 2008.0 |
415.92118 | 0.0 |
150.59546 | 0.0 |
236.90404 | 0.0 |
227.42159 | 1981.0 |
229.8771 | 1995.0 |
359.3922 | 0.0 |
403.17342 | 1998.0 |
296.59383 | 1997.0 |
117.65506 | 0.0 |
241.3971 | 0.0 |
34.92526 | 0.0 |
188.31628 | 0.0 |
409.02485 | 2002.0 |
335.5424 | 0.0 |
354.63791 | 0.0 |
213.31546 | 2007.0 |
238.62812 | 0.0 |
193.33179 | 1972.0 |
225.33179 | 0.0 |
166.84363 | 0.0 |
79.96036 | 1990.0 |
158.69342 | 2000.0 |
176.53506 | 0.0 |
347.61098 | 1999.0 |
106.39628 | 1994.0 |
147.93098 | 0.0 |
446.92853 | 0.0 |
360.22812 | 0.0 |
214.56934 | 0.0 |
325.35465 | 0.0 |
413.23057 | 0.0 |
218.04363 | 2001.0 |
215.30077 | 2002.0 |
57.44281 | 0.0 |
247.48363 | 2006.0 |
793.25995 | 0.0 |
467.3824 | 0.0 |
327.00036 | 1984.0 |
232.72444 | 2006.0 |
251.68934 | 0.0 |
197.3024 | 0.0 |
193.88036 | 0.0 |
383.32036 | 2004.0 |
269.71383 | 0.0 |
255.05914 | 0.0 |
337.18812 | 2009.0 |
240.92689 | 0.0 |
206.18404 | 2002.0 |
143.22893 | 0.0 |
244.27057 | 1980.0 |
83.56526 | 0.0 |
428.40771 | 0.0 |
261.11955 | 2007.0 |
208.37832 | 2008.0 |
369.78893 | 0.0 |
47.17669 | 2006.0 |
239.3073 | 0.0 |
17.37098 | 1993.0 |
257.04444 | 0.0 |
198.63465 | 0.0 |
208.40444 | 2002.0 |
338.28526 | 2005.0 |
175.15057 | 0.0 |
234.97098 | 0.0 |
275.06893 | 0.0 |
186.46159 | 0.0 |
201.74322 | 0.0 |
237.58322 | 0.0 |
219.402 | 0.0 |
461.29587 | 0.0 |
196.67546 | 0.0 |
290.63791 | 0.0 |
328.22812 | 0.0 |
260.64934 | 1981.0 |
245.83791 | 0.0 |
97.54077 | 0.0 |
248.0322 | 0.0 |
175.33342 | 1998.0 |
199.57506 | 0.0 |
229.45914 | 2005.0 |
902.26893 | 2000.0 |
271.12444 | 1991.0 |
211.17342 | 0.0 |
179.3824 | 1967.0 |
156.96934 | 0.0 |
281.0771 | 1998.0 |
291.97016 | 0.0 |
392.85506 | 0.0 |
223.00689 | 0.0 |
269.94893 | 1987.0 |
36.64934 | 1996.0 |
309.26322 | 0.0 |
178.41587 | 0.0 |
206.75873 | 2006.0 |
155.68934 | 1971.0 |
254.71955 | 1993.0 |
133.11955 | 0.0 |
260.362 | 0.0 |
135.28771 | 1984.0 |
158.27546 | 2005.0 |
154.93179 | 0.0 |
205.84444 | 2005.0 |
276.21832 | 0.0 |
193.61914 | 0.0 |
153.73016 | 1997.0 |
389.11955 | 0.0 |
195.23873 | 2007.0 |
210.72934 | 1995.0 |
336.06485 | 0.0 |
263.02649 | 0.0 |
230.26893 | 2001.0 |
40.6722 | 2007.0 |
255.92118 | 2009.0 |
305.60608 | 1995.0 |
177.8673 | 0.0 |
361.11628 | 0.0 |
357.66812 | 0.0 |
196.49261 | 2004.0 |
218.40934 | 1992.0 |
91.58485 | 0.0 |
185.25995 | 0.0 |
282.80118 | 0.0 |
244.68853 | 0.0 |
215.40526 | 1993.0 |
211.19955 | 1978.0 |
327.75791 | 0.0 |
510.40608 | 2005.0 |
212.74077 | 2009.0 |
120.86812 | 2008.0 |
507.08853 | 2001.0 |
265.11628 | 0.0 |
183.06567 | 0.0 |
199.54893 | 1982.0 |
41.92608 | 0.0 |
164.75383 | 0.0 |
267.33669 | 0.0 |
208.74404 | 1984.0 |
253.09995 | 2007.0 |
244.50567 | 0.0 |
195.73506 | 2007.0 |
160.07791 | 2007.0 |
327.70567 | 0.0 |
174.86322 | 0.0 |
272.92689 | 2005.0 |
251.53261 | 0.0 |
216.99873 | 0.0 |
195.3171 | 0.0 |
247.11791 | 0.0 |
101.3024 | 0.0 |
315.97669 | 2003.0 |
449.67138 | 0.0 |
173.16526 | 1998.0 |
394.44853 | 0.0 |
226.69016 | 2007.0 |
219.11465 | 0.0 |
240.92689 | 1997.0 |
227.91791 | 1999.0 |
119.84934 | 0.0 |
109.92281 | 1997.0 |
116.08771 | 0.0 |
187.71546 | 1975.0 |
191.65995 | 0.0 |
116.32281 | 1979.0 |
482.45506 | 0.0 |
262.71302 | 2003.0 |
208.97914 | 2005.0 |
209.81506 | 1975.0 |
129.85424 | 2002.0 |
219.0624 | 0.0 |
500.4273 | 2010.0 |
224.7571 | 0.0 |
274.85995 | 2003.0 |
145.162 | 1993.0 |
211.27791 | 0.0 |
167.49669 | 1981.0 |
415.7122 | 0.0 |
346.33098 | 0.0 |
399.69914 | 1999.0 |
136.56771 | 0.0 |
Exercises
- Why do you think average song durations increase dramatically in 70's?
- Add error bars with standard deviation around each average point in the plot.
- How did average loudness change over time?
- How did tempo change over time?
- What other aspects of songs can you explore with this technique?
Sampling and visualizing
You can dive deep into Scala visualisations here: - https://docs.databricks.com/notebooks/visualizations/charts-and-graphs-scala.html
You can also use R and Python for visualisations in the same notebook: - https://docs.databricks.com/notebooks/visualizations/index.html