ScaDaMaLe Course site and book

Historical FX-1-M Financial Data

Johannes Graner (LinkedIn), Albert Nilsson (LinkedIn) and Raazesh Sainudiin (LinkedIn)

2020, Uppsala, Sweden

This project was supported by Combient Mix AB through summer internships at:

Combient Competence Centre for Data Engineering Sciences, Department of Mathematics, Uppsala University, Uppsala, Sweden


Resources

This notebook builds on the following repository in order to obtain SparkSQL DatSets and DataFrames of freely available FX-1-M Data so that they can be ingested into delta.io Tables:

"./000a_finance_utils"

The Trend Calculus library is needed for case classes and parsers for the data.

import org.lamastex.spark.trendcalculus._
import org.lamastex.spark.trendcalculus._
defined object TrendUtils
val filePathRoot = TrendUtils.getFx1mPath

There are many pairs of currencies and/or commodities available.

dbutils.fs.ls(filePathRoot).foreach(fi => println("exchange pair: " + fi.name))
exchange pair: audcad/
exchange pair: audchf/
exchange pair: audjpy/
exchange pair: audnzd/
exchange pair: audusd/
exchange pair: auxaud/
exchange pair: bcousd/
exchange pair: cadchf/
exchange pair: cadjpy/
exchange pair: chfjpy/
exchange pair: etxeur/
exchange pair: euraud/
exchange pair: eurcad/
exchange pair: eurchf/
exchange pair: eurczk/
exchange pair: eurdkk/
exchange pair: eurgbp/
exchange pair: eurhuf/
exchange pair: eurjpy/
exchange pair: eurnok/
exchange pair: eurnzd/
exchange pair: eurpln/
exchange pair: eursek/
exchange pair: eurtry/
exchange pair: eurusd/
exchange pair: frxeur/
exchange pair: gbpaud/
exchange pair: gbpcad/
exchange pair: gbpchf/
exchange pair: gbpjpy/
exchange pair: gbpnzd/
exchange pair: gbpusd/
exchange pair: grxeur/
exchange pair: hkxhkd/
exchange pair: jpxjpy/
exchange pair: nsxusd/
exchange pair: nzdcad/
exchange pair: nzdchf/
exchange pair: nzdjpy/
exchange pair: nzdusd/
exchange pair: sgdjpy/
exchange pair: spxusd/
exchange pair: udxusd/
exchange pair: ukxgbp/
exchange pair: usdcad/
exchange pair: usdchf/
exchange pair: usdczk/
exchange pair: usddkk/
exchange pair: usdhkd/
exchange pair: usdhuf/
exchange pair: usdjpy/
exchange pair: usdmxn/
exchange pair: usdnok/
exchange pair: usdpln/
exchange pair: usdsek/
exchange pair: usdsgd/
exchange pair: usdtry/
exchange pair: usdzar/
exchange pair: wtiusd/
exchange pair: xagusd/
exchange pair: xauaud/
exchange pair: xauchf/
exchange pair: xaueur/
exchange pair: xaugbp/
exchange pair: xauusd/
exchange pair: zarjpy/

Let's look at Brent Oil price in USD.

dbutils.fs.ls(filePathRoot + "bcousd/").foreach(fi => println("name: " + fi.name + ", size: " + fi.size))
name: DAT_ASCII_BCOUSD_M1_2010.csv.gz, size: 284384
name: DAT_ASCII_BCOUSD_M1_2010.txt.gz, size: 41157
name: DAT_ASCII_BCOUSD_M1_2011.csv.gz, size: 2479115
name: DAT_ASCII_BCOUSD_M1_2011.txt.gz, size: 216327
name: DAT_ASCII_BCOUSD_M1_2012.csv.gz, size: 2327511
name: DAT_ASCII_BCOUSD_M1_2012.txt.gz, size: 321867
name: DAT_ASCII_BCOUSD_M1_2013.csv.gz, size: 2109500
name: DAT_ASCII_BCOUSD_M1_2013.txt.gz, size: 417973
name: DAT_ASCII_BCOUSD_M1_2014.csv.gz, size: 1961172
name: DAT_ASCII_BCOUSD_M1_2014.txt.gz, size: 431591
name: DAT_ASCII_BCOUSD_M1_2015.csv.gz, size: 2205678
name: DAT_ASCII_BCOUSD_M1_2015.txt.gz, size: 333277
name: DAT_ASCII_BCOUSD_M1_2016.csv.gz, size: 2131659
name: DAT_ASCII_BCOUSD_M1_2016.txt.gz, size: 342616
name: DAT_ASCII_BCOUSD_M1_2017.csv.gz, size: 1854793
name: DAT_ASCII_BCOUSD_M1_2017.txt.gz, size: 434781
name: DAT_ASCII_BCOUSD_M1_2018.csv.gz, size: 2251250
name: DAT_ASCII_BCOUSD_M1_2018.txt.gz, size: 306810
name: DAT_ASCII_BCOUSD_M1_2019.csv.gz, size: 2701059
name: DAT_ASCII_BCOUSD_M1_2019.txt.gz, size: 102290
name: DAT_ASCII_BCOUSD_M1_202001.csv.gz, size: 233757
name: DAT_ASCII_BCOUSD_M1_202001.txt.gz, size: 8391
name: DAT_ASCII_BCOUSD_M1_202002.csv.gz, size: 222628
name: DAT_ASCII_BCOUSD_M1_202002.txt.gz, size: 4379
name: DAT_ASCII_BCOUSD_M1_202003.csv.gz, size: 265471
name: DAT_ASCII_BCOUSD_M1_202003.txt.gz, size: 834
name: DAT_ASCII_BCOUSD_M1_202004.csv.gz, size: 245819
name: DAT_ASCII_BCOUSD_M1_202004.txt.gz, size: 953
name: DAT_ASCII_BCOUSD_M1_202005.csv.gz, size: 233828
name: DAT_ASCII_BCOUSD_M1_202005.txt.gz, size: 850
name: DAT_ASCII_BCOUSD_M1_202006.csv.gz, size: 244557
name: DAT_ASCII_BCOUSD_M1_202006.txt.gz, size: 1394
name: DAT_ASCII_BCOUSD_M1_202007.csv.gz, size: 74976
name: DAT_ASCII_BCOUSD_M1_202007.txt.gz, size: 1637

We use the parser available from Trend Calculus to read the csv files into a Spark Dataset.

val oilPath = filePathRoot + "bcousd/*.csv.gz"
val oilDS = spark.read.fx1m(oilPath).orderBy($"time")
oilDS.show(20, false)
+-------------------+-----+-----+-----+-----+------+
|time               |open |high |low  |close|volume|
+-------------------+-----+-----+-----+-----+------+
|2010-11-14 20:15:00|86.73|86.74|86.73|86.74|0     |
|2010-11-14 20:17:00|86.75|86.75|86.75|86.75|0     |
|2010-11-14 20:18:00|86.76|86.78|86.76|86.76|0     |
|2010-11-14 20:19:00|86.74|86.74|86.74|86.74|0     |
|2010-11-14 20:21:00|86.75|86.75|86.74|86.74|0     |
|2010-11-14 20:24:00|86.75|86.75|86.75|86.75|0     |
|2010-11-14 20:26:00|86.76|86.77|86.74|86.77|0     |
|2010-11-14 20:27:00|86.79|86.79|86.75|86.75|0     |
|2010-11-14 20:28:00|86.77|86.79|86.75|86.79|0     |
|2010-11-14 20:32:00|86.8 |86.81|86.79|86.81|0     |
|2010-11-14 20:33:00|86.81|86.81|86.81|86.81|0     |
|2010-11-14 20:34:00|86.81|86.81|86.81|86.81|0     |
|2010-11-14 20:35:00|86.81|86.81|86.79|86.79|0     |
|2010-11-14 20:36:00|86.79|86.8 |86.78|86.8 |0     |
|2010-11-14 20:37:00|86.78|86.8 |86.78|86.79|0     |
|2010-11-14 20:38:00|86.79|86.79|86.79|86.79|0     |
|2010-11-14 20:39:00|86.79|86.8 |86.79|86.79|0     |
|2010-11-14 20:40:00|86.79|86.8 |86.79|86.8 |0     |
|2010-11-14 20:41:00|86.8 |86.8 |86.8 |86.8 |0     |
|2010-11-14 20:42:00|86.8 |86.8 |86.8 |86.8 |0     |
+-------------------+-----+-----+-----+-----+------+
only showing top 20 rows