ScaDaMaLe Course site and book

Million Song Dataset - Kaggle Challenge

Predict which songs a user will listen to.

SOURCE: This is just a Scala-rification of the Python notebook published in databricks community edition in 2016.

Stage 2: Exploring songs data

This is the second notebook in this tutorial. In this notebook we do what any data scientist does with their data right after parsing it: exploring and understanding different aspect of data. Make sure you understand how we get the songsTable by reading and running the ETL notebook. In the ETL notebook we created and cached a temporary table named songsTable

Let's Do all the main bits in Stage 1 now before doing Stage 2 in this Notebook.

// Let's quickly do everything to register the tempView of the table here

// fill in comment ... EXERCISE!
case class Song(artist_id: String, artist_latitude: Double, artist_longitude: Double, artist_location: String, artist_name: String, duration: Double, end_of_fade_in: Double, key: Int, key_confidence: Double, loudness: Double, release: String, song_hotness: Double, song_id: String, start_of_fade_out: Double, tempo: Double, time_signature: Double, time_signature_confidence: Double, title: String, year: Double, partial_sequence: Int)

def parseLine(line: String): Song = {
  // fill in comment ...
  
  def toDouble(value: String, defaultVal: Double): Double = {
    try {
       value.toDouble
    } catch {
      case e: Exception => defaultVal
    }
  }

  def toInt(value: String, defaultVal: Int): Int = {
    try {
       value.toInt
      } catch {
      case e: Exception => defaultVal
    }
  }
  // fill in comment ...
  val tokens = line.split("\t")
  Song(tokens(0), toDouble(tokens(1), 0.0), toDouble(tokens(2), 0.0), tokens(3), tokens(4), toDouble(tokens(5), 0.0), toDouble(tokens(6), 0.0), toInt(tokens(7), -1), toDouble(tokens(8), 0.0), toDouble(tokens(9), 0.0), tokens(10), toDouble(tokens(11), 0.0), tokens(12), toDouble(tokens(13), 0.0), toDouble(tokens(14), 0.0), toDouble(tokens(15), 0.0), toDouble(tokens(16), 0.0), tokens(17), toDouble(tokens(18), 0.0), toInt(tokens(19), -1))
}

// this is loads all the data - a subset of the 1M songs dataset
val dataRDD = sc.textFile("/datasets/sds/songs/data-001/part-*") 

// .. fill in comment
val df = dataRDD.map(parseLine).toDF

// .. fill in comment
df.createOrReplaceTempView("songsTable")
defined class Song
parseLine: (line: String)Song
dataRDD: org.apache.spark.rdd.RDD[String] = /datasets/sds/songs/data-001/part-* MapPartitionsRDD[236] at textFile at command-2971213210276755:30
df: org.apache.spark.sql.DataFrame = [artist_id: string, artist_latitude: double ... 18 more fields]
spark.catalog.listTables.show(false) // make sure the temp view of our table is there
+----------------------------+--------+-----------+---------+-----------+
|name                        |database|description|tableType|isTemporary|
+----------------------------+--------+-----------+---------+-----------+
|all_prices                  |default |null       |MANAGED  |false      |
|bitcoin_normed_window       |default |null       |MANAGED  |false      |
|bitcoin_reversals_window    |default |null       |MANAGED  |false      |
|countrycodes                |default |null       |EXTERNAL |false      |
|gold_normed_window          |default |null       |MANAGED  |false      |
|gold_reversals_window       |default |null       |MANAGED  |false      |
|ltcar_locations_2_csv       |default |null       |MANAGED  |false      |
|magellan                    |default |null       |MANAGED  |false      |
|mobile_sample               |default |null       |EXTERNAL |false      |
|oil_normed_window           |default |null       |MANAGED  |false      |
|oil_reversals_window        |default |null       |MANAGED  |false      |
|oil_reversals_window2       |default |null       |MANAGED  |false      |
|over300all_2_txt            |default |null       |MANAGED  |false      |
|person                      |default |null       |MANAGED  |false      |
|personer                    |default |null       |MANAGED  |false      |
|persons                     |default |null       |MANAGED  |false      |
|simple_range                |default |null       |MANAGED  |false      |
|social_media_usage          |default |null       |MANAGED  |false      |
|social_media_usage_csv_gui  |default |null       |MANAGED  |false      |
|voronoi20191213uppsla1st_txt|default |null       |MANAGED  |false      |
+----------------------------+--------+-----------+---------+-----------+
only showing top 20 rows

A first inspection

A first step to any data exploration is viewing sample data. For this purpose we can use a simple SQL query that returns first 10 rows.

select * from songsTable limit 10
artist_id artist_latitude artist_longitude artist_location artist_name duration end_of_fade_in key key_confidence loudness release song_hotness song_id start_of_fade_out tempo time_signature time_signature_confidence title year partial_sequence
AR81V6H1187FB48872 0.0 0.0 Earl Sixteen 213.7073 0.0 11.0 0.419 -12.106 Soldier of Jah Army 0.0 SOVNZSZ12AB018A9B8 208.289 125.882 1.0 0.0 Rastaman 2003.0 -1.0
ARVVZQP11E2835DBCB 0.0 0.0 Wavves 133.25016 0.0 0.0 0.282 0.596 Wavvves 0.471578247701 SOJTQHQ12A8C143C5F 128.116 89.519 1.0 0.0 I Want To See You (And Go To The Movies) 2009.0 -1.0
ARFG9M11187FB3BBCB 0.0 0.0 Nashua USA C-Side 247.32689 0.0 9.0 0.612 -4.896 Santa Festival Compilation 2008 vol.1 0.0 SOAJSQL12AB0180501 242.196 171.278 5.0 1.0 Loose on the Dancefloor 0.0 225261.0
ARK4Z2O1187FB45FF0 0.0 0.0 Harvest 337.05751 0.247 4.0 0.46 -9.092 Underground Community 0.0 SOTDRVW12AB018BEB9 327.436 84.986 4.0 0.673 No Return 0.0 101619.0
AR4VQSG1187FB57E18 35.25082 -91.74015 Searcy, AR Gossip 430.23628 0.0 2.0 3.4e-2 -6.846 Yr Mangled Heart 0.0 SOTVOCL12A8AE478DD 424.06 121.998 4.0 0.847 Yr Mangled Heart 2006.0 740623.0
ARNBV1X1187B996249 0.0 0.0 Alex 186.80118 0.0 4.0 0.641 -16.108 Jolgaledin 0.0 SODTGRY12AB0182438 166.156 140.735 4.0 5.5e-2 Mariu Sonur Jesus 0.0 673970.0
ARXOEZX1187B9B82A1 0.0 0.0 Elie Attieh 361.89995 0.0 7.0 0.863 -4.919 ELITE 0.0 SOIINTJ12AB0180BA6 354.476 128.024 4.0 0.399 Fe Yom We Leila 0.0 280304.0
ARXPUIA1187B9A32F1 0.0 0.0 Rome, Italy Simone Cristicchi 220.00281 2.119 4.0 0.486 -6.52 Dall'Altra Parte Del Cancello 0.484225272411 SONHXJK12AAF3B5290 214.761 99.954 1.0 0.928 L'Italiano 2007.0 745962.0
ARNPPTH1187B9AD429 51.4855 -0.37196 Heston, Middlesex, England Jimmy Page 156.86485 0.334 7.0 0.493 -9.962 No Introduction Necessary [Deluxe Edition] 0.0 SOGUHGW12A58A80E06 149.269 162.48 4.0 0.534 Wailing Sounds 2004.0 599250.0
AROGWRA122988FEE45 0.0 0.0 Christos Dantis 256.67873 2.537 9.0 0.742 -13.404 Daktilika Apotipomata 0.0 SOJJOYI12A8C13399D 248.912 134.944 4.0 0.162 Stin Proigoumeni Zoi 0.0 611396.0
table("songsTable").printSchema()
root
 |-- artist_id: string (nullable = true)
 |-- artist_latitude: double (nullable = false)
 |-- artist_longitude: double (nullable = false)
 |-- artist_location: string (nullable = true)
 |-- artist_name: string (nullable = true)
 |-- duration: double (nullable = false)
 |-- end_of_fade_in: double (nullable = false)
 |-- key: integer (nullable = false)
 |-- key_confidence: double (nullable = false)
 |-- loudness: double (nullable = false)
 |-- release: string (nullable = true)
 |-- song_hotness: double (nullable = false)
 |-- song_id: string (nullable = true)
 |-- start_of_fade_out: double (nullable = false)
 |-- tempo: double (nullable = false)
 |-- time_signature: double (nullable = false)
 |-- time_signature_confidence: double (nullable = false)
 |-- title: string (nullable = true)
 |-- year: double (nullable = false)
 |-- partial_sequence: integer (nullable = false)
select count(*) from songsTable
count(1)
31369.0
table("songsTable").count() // or equivalently with DataFrame API - recall table("songsTable") is a DataFrame
res4: Long = 31369
display(sqlContext.sql("SELECT duration, year FROM songsTable")) // Aggregation is set to 'Average' in 'Plot Options'
duration year
213.7073 2003.0
133.25016 2009.0
247.32689 0.0
337.05751 0.0
430.23628 2006.0
186.80118 0.0
361.89995 0.0
220.00281 2007.0
156.86485 2004.0
256.67873 0.0
204.64281 0.0
112.48281 0.0
170.39628 0.0
215.95383 0.0
303.62077 0.0
266.60526 0.0
326.19057 1997.0
51.04281 2009.0
129.4624 0.0
253.33506 2003.0
237.76608 2004.0
132.96281 0.0
399.35955 2006.0
168.75057 1991.0
396.042 0.0
192.10404 1968.0
12.2771 2006.0
367.56853 0.0
189.93587 0.0
233.50812 0.0
462.68036 0.0
202.60526 0.0
241.52771 0.0
275.64363 1992.0
350.69342 2007.0
166.55628 1968.0
249.49506 1983.0
53.86404 1992.0
233.76934 2001.0
275.12118 2009.0
191.13751 2006.0
299.07546 0.0
468.74077 0.0
110.34077 0.0
234.78812 2003.0
705.25342 2006.0
383.52934 0.0
196.10077 0.0
299.20608 1998.0
94.04036 0.0
28.08118 2006.0
207.93424 2006.0
152.0322 1999.0
207.96036 2002.0
371.25179 0.0
288.93995 2002.0
235.93751 2004.0
505.70404 0.0
177.57995 0.0
376.842 2004.0
266.84036 2004.0
270.8371 2006.0
178.18077 0.0
527.17669 0.0
244.27057 0.0
436.47955 2006.0
236.79955 0.0
134.53016 2005.0
181.002 0.0
239.41179 1999.0
72.98567 0.0
214.36036 2001.0
150.59546 2007.0
152.45016 1970.0
218.17424 0.0
290.63791 0.0
149.05424 0.0
440.21506 0.0
212.34893 1988.0
278.67383 0.0
269.60934 1974.0
182.69995 2002.0
207.882 2007.0
102.50404 0.0
437.60281 0.0
216.11057 2009.0
193.25342 0.0
234.16118 2009.0
695.77098 0.0
297.58649 1996.0
265.37751 2000.0
182.85669 1990.0
202.23955 0.0
390.08608 2009.0
242.78159 2000.0
242.54649 2002.0
496.66567 2004.0
395.36281 0.0
234.89261 1999.0
237.84444 2005.0
313.57342 2009.0
489.22077 2001.0
239.98649 2004.0
128.65261 0.0
193.07057 0.0
144.19546 0.0
196.96281 2006.0
222.06649 1997.0
58.38322 0.0
346.14812 1998.0
406.54322 0.0
304.09098 2009.0
180.21832 2003.0
213.41995 0.0
323.44771 0.0
54.7522 2009.0
437.02812 1994.0
268.7473 2009.0
104.75057 0.0
248.60689 2006.0
221.41342 0.0
237.81832 1991.0
216.34567 2009.0
78.94159 0.0
47.22893 2005.0
202.00444 2007.0
293.56363 0.0
206.44526 1986.0
267.78077 2003.0
187.27138 2008.0
249.05098 2009.0
221.51791 0.0
452.88444 0.0
163.76118 1992.0
257.17506 0.0
235.78077 0.0
257.82812 1996.0
195.34322 0.0
478.1971 0.0
268.01587 1997.0
136.93342 1983.0
397.53098 0.0
194.69016 2001.0
580.80608 0.0
177.71057 2006.0
257.43628 1999.0
184.13669 0.0
64.57424 2001.0
123.92444 1993.0
257.07057 0.0
219.48036 1996.0
679.41832 0.0
252.29016 1995.0
311.90159 2004.0
252.76036 1998.0
138.94485 0.0
428.64281 0.0
295.31383 0.0
212.03546 0.0
426.50077 0.0
197.11955 0.0
191.55546 0.0
187.53261 2006.0
184.97261 2004.0
388.41424 2009.0
218.90567 0.0
246.49098 0.0
452.88444 0.0
223.18975 0.0
245.2371 0.0
148.92363 0.0
362.81424 2005.0
171.44118 0.0
207.72526 2005.0
191.29424 0.0
208.50893 0.0
240.24771 1995.0
373.44608 2002.0
172.01587 0.0
153.25995 2007.0
242.36363 1994.0
177.55383 0.0
263.20934 1994.0
191.03302 2007.0
232.77669 0.0
220.65587 0.0
132.57098 2002.0
189.6224 1993.0
32.522 1997.0
173.94893 0.0
268.01587 2006.0
91.97669 0.0
215.77098 0.0
195.47383 0.0
234.81424 1977.0
110.78485 0.0
155.74159 0.0
172.5122 0.0
227.76118 1995.0
233.01179 2007.0
298.89261 0.0
245.36771 1994.0
276.08771 2005.0
375.77098 2003.0
273.71057 0.0
226.92526 0.0
196.46649 0.0
199.65342 1995.0
243.40853 0.0
207.62077 2006.0
252.73424 0.0
244.32281 0.0
152.65914 0.0
203.88526 2003.0
120.16281 0.0
214.77832 1977.0
204.9824 0.0
118.30812 1996.0
205.26975 0.0
499.22567 0.0
217.83465 2005.0
192.57424 2005.0
328.09751 0.0
298.03057 1968.0
501.49832 0.0
276.40118 0.0
507.55873 2006.0
191.08526 2008.0
324.38812 0.0
218.56608 0.0
232.30649 0.0
295.05261 1972.0
225.74975 2003.0
522.00444 0.0
245.86404 1967.0
263.67955 0.0
556.61669 2009.0
227.94404 1998.0
83.82649 1964.0
242.85995 0.0
233.09016 2008.0
201.74322 0.0
476.15955 0.0
370.93832 2005.0
229.17179 0.0
288.07791 2001.0
91.34975 0.0
230.79138 2005.0
256.46975 2003.0
203.44118 0.0
230.81751 2003.0
272.29995 0.0
201.22077 2008.0
204.93016 2010.0
372.84526 0.0
63.65995 2005.0
412.15955 0.0
270.10567 0.0
104.6722 0.0
214.25587 1970.0
230.05995 0.0
155.74159 0.0
218.04363 2008.0
357.77261 2007.0
318.27546 1985.0
444.55138 2010.0
509.07383 0.0
176.95302 0.0
95.34649 0.0
207.67302 0.0
256.67873 1994.0
252.78649 0.0
234.60526 0.0
167.65342 0.0
266.16118 0.0
188.05506 0.0
229.14567 2009.0
227.00363 2004.0
74.50077 1992.0
222.09261 0.0
212.68853 1984.0
155.74159 0.0
153.65179 0.0
548.51873 0.0
445.90975 2003.0
317.49179 1999.0
140.32934 0.0
309.4722 0.0
142.91546 0.0
429.24363 2007.0
172.19873 0.0
215.562 0.0
290.79465 2009.0
197.04118 0.0
309.44608 0.0
265.01179 1999.0
257.64526 2000.0
203.54567 0.0
161.56689 0.0
177.84118 0.0
260.04853 2004.0
195.00363 1988.0
268.042 0.0
195.97016 1991.0
351.92118 0.0
119.35302 0.0
177.24036 0.0
259.83955 0.0
222.51057 2008.0
163.97016 2004.0
139.49342 0.0
158.77179 0.0
193.4624 2000.0
131.082 1963.0
190.95465 1998.0
413.3873 2005.0
134.73914 1966.0
162.40281 1965.0
243.59138 1965.0
180.84526 0.0
315.14077 0.0
221.51791 1994.0
122.53995 2008.0
243.43465 1990.0
200.202 1982.0
95.50322 2000.0
200.4371 1998.0
186.93179 0.0
492.22485 1999.0
359.33995 1972.0
89.39057 1990.0
212.81914 0.0
315.03628 1996.0
214.69995 0.0
137.92608 1993.0
559.49016 0.0
382.14485 1991.0
430.31465 2008.0
171.25832 0.0
210.12853 2002.0
53.18485 2005.0
78.65424 1993.0
209.162 2008.0
237.60934 2006.0
184.47628 2009.0
323.02975 1997.0
158.27546 0.0
213.86404 0.0
470.69995 0.0
229.79873 2005.0
392.22812 0.0
196.62322 0.0
80.97914 0.0
124.55138 1989.0
230.32118 1971.0
132.51873 0.0
112.95302 1994.0
131.52608 0.0
153.25995 2010.0
211.01669 0.0
218.93179 2008.0
175.0722 2010.0
116.61016 1997.0
251.45424 2001.0
269.50485 2004.0
231.47057 0.0
298.37016 1996.0
314.122 2005.0
263.99302 0.0
480.91383 2001.0
305.10975 0.0
280.16281 0.0
295.65342 1999.0
411.45424 2007.0
265.97832 0.0
153.96526 0.0
210.31138 1970.0
241.44934 0.0
235.33669 0.0
352.65261 0.0
293.35465 0.0
243.66975 2003.0
133.22404 0.0
233.03791 0.0
339.93098 0.0
249.80853 1993.0
253.72689 2004.0
94.35383 1981.0
130.63791 0.0
195.36934 0.0
229.25016 2007.0
314.64444 2007.0
329.1424 1998.0
224.46975 1990.0
215.562 1987.0
236.85179 1990.0
197.11955 1957.0
251.76771 2004.0
183.50975 0.0
268.01587 2005.0
413.02159 0.0
385.17506 2000.0
358.16444 0.0
164.77995 0.0
253.36118 2004.0
196.49261 2007.0
157.6224 1999.0
310.93506 0.0
434.96444 1991.0
157.04771 1991.0
266.16118 2007.0
267.59791 1977.0
303.90812 0.0
277.18485 2009.0
272.22159 0.0
155.95057 0.0
127.00689 1997.0
152.86812 2005.0
224.7571 1990.0
175.41179 0.0
151.97995 0.0
199.99302 0.0
251.53261 0.0
252.96934 2004.0
181.13261 1984.0
195.49995 0.0
328.202 2001.0
187.71546 0.0
166.94812 1985.0
242.72934 1988.0
218.80118 2005.0
205.68771 0.0
146.93832 1996.0
449.4624 2000.0
503.40526 0.0
181.34159 0.0
143.90812 0.0
406.36036 0.0
269.87057 0.0
265.29914 0.0
242.88608 0.0
110.39302 0.0
262.84363 0.0
334.00118 1990.0
173.81832 2007.0
608.78322 0.0
197.22404 0.0
163.94404 2008.0
93.09995 2001.0
206.75873 0.0
183.50975 0.0
402.442 0.0
735.79057 1986.0
233.19465 1997.0
326.55628 0.0
525.50485 0.0
396.19873 0.0
171.12771 0.0
318.1971 2006.0
323.70893 2002.0
526.99383 0.0
161.09669 1991.0
168.41098 1990.0
249.57342 0.0
405.4722 0.0
271.0722 2010.0
190.69342 2009.0
151.61424 2001.0
121.57342 0.0
117.08036 0.0
244.24444 2008.0
246.85669 0.0
144.03873 2007.0
169.79546 1988.0
193.93261 2004.0
325.77261 0.0
337.34485 0.0
143.67302 2009.0
211.69587 0.0
299.4673 1978.0
159.76444 0.0
337.31873 0.0
259.18649 2007.0
221.64853 0.0
164.54485 0.0
56.34567 0.0
184.21506 0.0
249.23383 2010.0
127.29424 1994.0
306.6771 1980.0
168.98567 0.0
290.2722 0.0
182.33424 2004.0
180.92363 0.0
233.76934 1990.0
423.70567 0.0
139.36281 0.0
289.72363 2005.0
100.96281 2005.0
153.05098 2009.0
129.25342 0.0
190.11873 1993.0
158.1971 0.0
234.94485 2000.0
256.02567 0.0
279.84934 0.0
217.7824 2005.0
271.62077 2005.0
372.34893 0.0
264.88118 0.0
270.18404 1984.0
42.86649 0.0
247.27465 0.0
185.10322 1990.0
333.94893 0.0
380.49914 1999.0
517.72036 0.0
208.95302 2006.0
359.73179 0.0
378.72281 1995.0
110.41914 0.0
237.37424 2003.0
136.30649 0.0
153.73016 2005.0
209.8673 2007.0
224.86159 0.0
202.34404 0.0
229.43302 0.0
300.56444 2003.0
264.35873 0.0
213.9424 0.0
164.77995 2004.0
206.75873 0.0
249.73016 2009.0
521.11628 2002.0
240.09098 0.0
347.89832 0.0
224.96608 1993.0
250.25261 0.0
419.00363 0.0
593.3971 1958.0
269.89669 1999.0
235.12771 2009.0
180.76689 0.0
304.03873 2004.0
253.36118 2006.0
311.74485 2006.0
353.43628 0.0
337.00526 0.0
305.00526 2006.0
113.76281 2007.0
379.74159 0.0
258.76853 1993.0
157.64853 0.0
352.28689 0.0
221.51791 0.0
249.44281 0.0
205.42649 0.0
166.922 0.0
250.25261 0.0
224.73098 2003.0
316.83873 2002.0
269.34812 2007.0
188.02893 0.0
276.87138 2001.0
263.02649 0.0
320.44363 0.0
531.43465 2005.0
126.85016 2008.0
232.01914 0.0
243.87873 0.0
288.60036 2004.0
817.57995 2007.0
200.9073 0.0
229.48526 2009.0
263.65342 1971.0
209.71057 2008.0
430.54975 2007.0
531.9571 0.0
277.39383 0.0
253.41342 1999.0
538.5922 0.0
187.34975 0.0
189.67465 2006.0
247.66649 0.0
196.15302 2008.0
248.45016 0.0
266.26567 2005.0
174.41914 0.0
241.21424 1996.0
213.39383 0.0
201.66485 1956.0
141.16526 0.0
198.76526 2010.0
234.03057 2002.0
293.77261 0.0
149.83791 0.0
193.09669 0.0
416.62649 2007.0
206.18404 2008.0
292.15302 0.0
209.55383 1997.0
303.46404 0.0
284.31628 0.0
209.34485 0.0
131.34322 2010.0
127.16363 0.0
228.98893 1983.0
18.18077 0.0
202.762 1999.0
475.21914 1989.0
434.52036 2002.0
306.36363 0.0
251.84608 2007.0
392.80281 1999.0
191.63383 0.0
207.90812 0.0
298.86649 0.0
195.36934 0.0
236.06812 1995.0
315.76771 2009.0
214.5171 0.0
140.90404 0.0
147.66975 0.0
230.50404 0.0
259.99628 2010.0
234.70975 1994.0
191.97342 1992.0
305.6322 0.0
197.53751 1997.0
152.05832 0.0
360.82893 1998.0
440.37179 0.0
211.09506 2009.0
362.60526 1998.0
364.64281 1997.0
267.12771 0.0
380.81261 2007.0
248.13669 1995.0
253.20444 0.0
244.03546 0.0
159.13751 0.0
246.12526 0.0
40.95955 2005.0
200.04526 2007.0
155.08853 0.0
144.66567 0.0
170.86649 0.0
286.71955 2001.0
333.19138 1996.0
542.1971 0.0
222.37995 0.0
195.68281 2003.0
440.00608 0.0
223.08526 0.0
378.98404 0.0
91.45424 1983.0
114.65098 2009.0
218.80118 0.0
242.36363 0.0
143.0722 1962.0
242.78159 2007.0
256.31302 0.0
244.37506 0.0
36.54485 2007.0
401.94567 1999.0
178.65098 2003.0
277.002 2009.0
288.70485 2002.0
228.91057 2006.0
204.06812 0.0
212.40118 0.0
224.31302 2008.0
195.7873 1985.0
244.63628 2005.0
241.81506 0.0
224.10404 2001.0
132.75383 2008.0
113.3971 0.0
237.03465 0.0
162.58567 1987.0
247.24853 2008.0
285.30893 0.0
318.24934 0.0
375.53587 2007.0
188.78649 0.0
108.79955 0.0
270.91546 0.0
249.23383 0.0
192.80934 1984.0
295.20934 0.0
177.84118 2006.0
242.6771 0.0
245.28934 1999.0
105.61261 0.0
329.29914 0.0
207.46404 0.0
225.51465 2007.0
123.8722 0.0
270.10567 2008.0
174.86322 0.0
377.28608 0.0
220.18567 2005.0
1190.53016 0.0
1518.65424 0.0
438.64771 2008.0
344.842 2001.0
76.48608 1994.0
174.52363 2002.0
581.14567 0.0
177.68444 0.0
125.962 0.0
160.39138 2006.0
211.27791 0.0
182.88281 0.0
261.53751 2005.0
285.80526 0.0
263.44444 1991.0
133.32853 1998.0
313.99138 1990.0
199.18322 0.0
200.98567 0.0
170.84036 2009.0
194.48118 0.0
241.65832 0.0
245.15873 1970.0
262.66077 2002.0
307.46077 1999.0
295.20934 0.0
259.52608 0.0
347.19302 0.0
206.91546 0.0
399.51628 2008.0
271.25506 0.0
172.7473 1991.0
231.65342 1993.0
208.1171 1991.0
195.76118 1983.0
723.27791 1970.0
282.95791 0.0
153.12934 0.0
207.15057 0.0
174.41914 0.0
269.29587 2005.0
275.3824 0.0
149.41995 0.0
108.35546 1963.0
243.69587 0.0
308.27057 1996.0
204.90404 2007.0
311.24853 2001.0
164.77995 0.0
449.51465 0.0
140.93016 0.0
165.22404 2005.0
53.26322 2007.0
218.80118 2005.0
300.85179 1991.0
388.75383 2007.0
150.77832 1970.0
293.11955 0.0
177.71057 0.0
184.11057 2005.0
225.17506 2009.0
272.19546 2002.0
157.67465 0.0
204.61669 0.0
93.98812 0.0
204.45995 0.0
307.1473 0.0
347.0624 1970.0
184.73751 2005.0
146.65098 0.0
513.90649 2001.0
293.85098 1970.0
121.73016 2003.0
86.72608 0.0
171.25832 2007.0
264.95955 2007.0
411.68934 0.0
190.79791 1971.0
159.65995 0.0
162.89914 0.0
205.97506 2006.0
204.59057 0.0
117.02812 0.0
135.28771 0.0
163.65669 0.0
254.95465 0.0
178.31138 2001.0
150.77832 2001.0
410.53995 2001.0
222.30159 0.0
314.74893 0.0
233.11628 0.0
226.21995 0.0
441.67791 0.0
120.99873 2009.0
157.75302 0.0
203.65016 0.0
287.73832 0.0
226.7424 1997.0
69.56363 1993.0
174.52363 2007.0
363.67628 0.0
136.48934 0.0
390.60853 0.0
284.60363 0.0
291.81342 1990.0
502.7522 0.0
197.27628 0.0
329.53424 2009.0
340.1922 2003.0
170.94485 0.0
113.57995 0.0
205.24363 2009.0
169.22077 1994.0
285.70077 1980.0
221.23057 0.0
310.38649 0.0
353.48853 2008.0
415.92118 0.0
150.59546 0.0
236.90404 0.0
227.42159 1981.0
229.8771 1995.0
359.3922 0.0
403.17342 1998.0
296.59383 1997.0
117.65506 0.0
241.3971 0.0
34.92526 0.0
188.31628 0.0
409.02485 2002.0
335.5424 0.0
354.63791 0.0
213.31546 2007.0
238.62812 0.0
193.33179 1972.0
225.33179 0.0
166.84363 0.0
79.96036 1990.0
158.69342 2000.0
176.53506 0.0
347.61098 1999.0
106.39628 1994.0
147.93098 0.0
446.92853 0.0
360.22812 0.0
214.56934 0.0
325.35465 0.0
413.23057 0.0
218.04363 2001.0
215.30077 2002.0
57.44281 0.0
247.48363 2006.0
793.25995 0.0
467.3824 0.0
327.00036 1984.0
232.72444 2006.0
251.68934 0.0
197.3024 0.0
193.88036 0.0
383.32036 2004.0
269.71383 0.0
255.05914 0.0
337.18812 2009.0
240.92689 0.0
206.18404 2002.0
143.22893 0.0
244.27057 1980.0
83.56526 0.0
428.40771 0.0
261.11955 2007.0
208.37832 2008.0
369.78893 0.0
47.17669 2006.0
239.3073 0.0
17.37098 1993.0
257.04444 0.0
198.63465 0.0
208.40444 2002.0
338.28526 2005.0
175.15057 0.0
234.97098 0.0
275.06893 0.0
186.46159 0.0
201.74322 0.0
237.58322 0.0
219.402 0.0
461.29587 0.0
196.67546 0.0
290.63791 0.0
328.22812 0.0
260.64934 1981.0
245.83791 0.0
97.54077 0.0
248.0322 0.0
175.33342 1998.0
199.57506 0.0
229.45914 2005.0
902.26893 2000.0
271.12444 1991.0
211.17342 0.0
179.3824 1967.0
156.96934 0.0
281.0771 1998.0
291.97016 0.0
392.85506 0.0
223.00689 0.0
269.94893 1987.0
36.64934 1996.0
309.26322 0.0
178.41587 0.0
206.75873 2006.0
155.68934 1971.0
254.71955 1993.0
133.11955 0.0
260.362 0.0
135.28771 1984.0
158.27546 2005.0
154.93179 0.0
205.84444 2005.0
276.21832 0.0
193.61914 0.0
153.73016 1997.0
389.11955 0.0
195.23873 2007.0
210.72934 1995.0
336.06485 0.0
263.02649 0.0
230.26893 2001.0
40.6722 2007.0
255.92118 2009.0
305.60608 1995.0
177.8673 0.0
361.11628 0.0
357.66812 0.0
196.49261 2004.0
218.40934 1992.0
91.58485 0.0
185.25995 0.0
282.80118 0.0
244.68853 0.0
215.40526 1993.0
211.19955 1978.0
327.75791 0.0
510.40608 2005.0
212.74077 2009.0
120.86812 2008.0
507.08853 2001.0
265.11628 0.0
183.06567 0.0
199.54893 1982.0
41.92608 0.0
164.75383 0.0
267.33669 0.0
208.74404 1984.0
253.09995 2007.0
244.50567 0.0
195.73506 2007.0
160.07791 2007.0
327.70567 0.0
174.86322 0.0
272.92689 2005.0
251.53261 0.0
216.99873 0.0
195.3171 0.0
247.11791 0.0
101.3024 0.0
315.97669 2003.0
449.67138 0.0
173.16526 1998.0
394.44853 0.0
226.69016 2007.0
219.11465 0.0
240.92689 1997.0
227.91791 1999.0
119.84934 0.0
109.92281 1997.0
116.08771 0.0
187.71546 1975.0
191.65995 0.0
116.32281 1979.0
482.45506 0.0
262.71302 2003.0
208.97914 2005.0
209.81506 1975.0
129.85424 2002.0
219.0624 0.0
500.4273 2010.0
224.7571 0.0
274.85995 2003.0
145.162 1993.0
211.27791 0.0
167.49669 1981.0
415.7122 0.0
346.33098 0.0
399.69914 1999.0
136.56771 0.0

Exercises

  1. Why do you think average song durations increase dramatically in 70's?
  2. Add error bars with standard deviation around each average point in the plot.
  3. How did average loudness change over time?
  4. How did tempo change over time?
  5. What other aspects of songs can you explore with this technique?

Sampling and visualizing

You can dive deep into Scala visualisations here: - https://docs.databricks.com/notebooks/visualizations/charts-and-graphs-scala.html

You can also use R and Python for visualisations in the same notebook: - https://docs.databricks.com/notebooks/visualizations/index.html