This course is designed by Raazesh Sainudiin, an Associate Professor in Mathematics with Specialisation in Data Science at the Department of Mathematics, Uppsala University and Consulting Principal Data Scientist at Combient AB in Stockholm.
COURSE
The course will provide an introduction to Data Science simultaneously from a computational, mathematical and statistical perspective.
Style
The approach will use formal mathematical communication of concepts with concomitant development of computer programs.
Lectures will build on: Sets, Maps, Functions, Modular Arithmetic, Axiomatic Probability, Conditional probability, constructive understanding of random variables and structures including graphs, Statistics, Likelihood Principle, Decisions (parametric and non-parametric) including hypothesis testing and estimation.
Everyone should get something from this course and it is meant to help one prototype in the most active open-source mathematically consistent ecosystem based around Python.
Course Materials
- Download the SageMath
.ipynb
notebooks as well asimages
anddata
directories in the following.zip
file: - Unzip it into the directory you launched the SageMath Jupyter notebook server from (see SageMath instrctions below).
- You should be able to see all the jupyter
.ipynb
notebooks by navigating from your jupyter notebook server.
Individual SageMath Jupyter .ipynb
Notebooks - add or replace with new or updated notebooks, as they become available to cover target topics including correlation, linear regression and non/parametric hypothesis testing, etc.
Roughly one notebook will be covered per lab/lecture (we have a total of 12 lectures; some of which can go faster).
We will break regularly for exercises as assignments for maximum retention and immediate brain-muscle memory on your operating systems.
Basic Notebooks
- 00. Introduction
- 01. BASH Crash
- 02. Numbers, Strings, Booleans and Sets
- 03. Map, Function, Collection and Probability
- 04. Conditional Probability, Random Variable, Loop and Conditional
- 05. Random Variables, Expectations, Data, Statistics, Arrays and Tuples, Iterators and Generators
- 06. Statistics from Data: Fetching New Zealand Earthquakes, Counting votes in 2018 Swedish National Election, Locating Biergartens in Germany & Live Play with
data/
- 07. Modular Arithmetic, Linear Congruential Generators, and Pseudo-Random Numbers
- 08. Pseudo-Random Numbers, Simulating from Some Discrete and Continuous Random Variables
- 09. Estimation, Likelihood, Maximum Likelihood Estimators and Regressions
- 10. Convergence of Limits of Random Variables, Confidence Set Estimation and Testing
- 11. Non-parametric Estimation and Testing
- 12. Linear Regression
- 13. Markov Chains and Random Structures
Additional Research Notebooks
- Transmission Process
- Now, we are ready for scalable data engineering sciences in Python and Scala via Apache Spark
- ML
- DL
- geospatial
- NLP
- Network science
This is the GitHub directory containing the SageMath contents for the course:
Assignments for Exercising between Lectures
Assignments will be tried in between lab-lectures regulary:
- first try on your own,
- start cooperating with your course-mates to learn/co-learn/trach,
- the solutions will be displayed only after a good amount of time is devoted to solving the assigned exercises.
Assignments
Software
We will use SageMath locally during face-to-face interactions.
Supplementary Book:
Computer labs
It is strongly recommended that you install SageMath in your laptop or desktop computer for convenience by following the instructions in the next section.
You can also do it in the cloud for free using cocalc.com for collaborative calculation in the cloud. We might do this as the download is happening.
Local: Prepare your laptop on your own
It may be more convenient to install SageMath on your laptop or desktop.
- Follow the download and installation instructions for your Operating System from the following URL:
- To test that you have installed correctly do the following:
- On a Mac OS X or Unix/Linux system, say you installed sage in a directory inside your home directory called
~/all/software/sage/
, then you can see if the following command launches a Jupyter notebook server successfully:$ ~/all/software/sage/SageMath/sage -n jupyter
- Those with Windows should follow the instructions in the following URL and test that the jupyter notebook server launches successfully:
- On a Mac OS X or Unix/Linux system, say you installed sage in a directory inside your home directory called
2. By browser for collaborative calculation in the cloud
As a backup in the cloud:
- Get a free account in cocalc.com, for collaborative calculation in the cloud.
- Upload the course materials as
.zip
file to your account.
Course on GitHub
The source GitHub directory for the course contents is:
YouTube Video Clips for SageMath Setup
- Step 1: How to download Applied Statistics Course Content for the first time:
- Step 2: How to Download SageMath and set it up on your system (done for a similar course):
- Mac OS X proper way watch https://youtu.be/i8AweQ2GyS0
- Linux (you know better?)
- Windows - follow instructions above (may not be supported for Windows 7!)