ScaDaMaLe Course site and book

Trends in Financial Stocks and News Events

Johannes Graner (LinkedIn), Albert Nilsson (LinkedIn) and Raazesh Sainudiin (LinkedIn)

2020, Uppsala, Sweden

This project was supported by Combient Mix AB through summer internships at:

Combient Competence Centre for Data Engineering Sciences, Department of Mathematics, Uppsala University, Uppsala, Sweden


According to Merriam-Webster Dictionary the definitionof trend is as follows:

a prevailing tendency or inclination : drift. How to use trend in a sentence. Synonym Discussion of trend.

Since people invest in financial stocks of publicly traded companies and make these decisions based on their understanding of current events reported in mass media, a natural question is:

How can one try to represent and understand this interplay?

The following material, first goes through the ETL process to ingest:

  • financial data and then
  • mass-media data

in a structured manner so that one can begin scalable data science processes upon them.

In the sequel two libraries are used to take advantage of SparkSQL and delta.io tables ("Spark on ACID"):

  • for encoding and interpreting trends (so-called trend calculus) in any time-series, say financial stock prices, for instance.
  • for structured representaiton of the worl'd largest open-sourced mass media data:

The last few notebooks show some simple data analytics to help extract and identify events that may be related to trends of interest.

We note that the sequel here is mainly focused on the data engineering science of ETL and basic ML Pipelines. We hope it will inspire others to do more sophisticated research, including scalable causal inference and various forms of distributed deep/reinforcement learning for more sophisticated decision problems.