Applied Statistics

1MS926, Spring 2019, Uppsala University

©2019 Raazesh Sainudiin. Attribution 4.0 International (CC BY 4.0)

Assignment 1 for Course 1MS926

Fill in your Personal Number, make sure you pass the # ... Test cells and submit by email from your official uu.se student email account to raazesh.sainudiin @ math.uu.se with Subject line 1MS926 Assignment 1. You can submit multiple times before the deadline and your highest score will be used.

In [ ]:
# Enter your 12 digit personal number here and evaluate this cell
MyPersonalNumber = 'YYYYMMDDXXXX'

#tests
assert(isinstance(MyPersonalNumber, basestring))
assert(MyPersonalNumber.isdigit())
assert(len(MyPersonalNumber)==12)

Assignment 1, PROBLEM 0

Maximum Points = 3

Given that you are in the civil engineering programme in systems in technology and society, spend some time reading the following:

Answer whether each of the following statements is True or False according to the authors by appropriately replacing Xxxxx coresponding to TruthValueOfStatement0a, TruthValueOfStatement0b and TruthValueOfStatement0c, respectively, in the next cell to demonstrate your reading comprehension.

  1. Statement0a = Each small moment of convenience (provided by Amazon's Echo) – be it answering a question, turning on a light, or playing a song – requires a vast planetary network, fueled by the extraction of non-renewable materials, labor, and data.
  2. Statement0b = The Echo user is simultaneously a consumer, a resource, a worker, and a product
  3. Statement0c = Many of the assumptions about human life made by machine learning systems are narrow, normative and laden with error. Yet they are inscribing and building those assumptions into a new world, and will increasingly play a role in how opportunities, wealth, and knowledge are distributed.
In [ ]:
# Replace Xxxxx with True or False; Don't modify anything else in this cell!

TruthValueOfStatement0a = Xxxxx

TruthValueOfStatement0b = Xxxxx

TruthValueOfStatement0c = Xxxxx

Local Test for Assignment 1, PROBLEM 0

Evaluate cell below to make sure your answer is valid. You should not modify anything in the cell below when evaluating it to do a local test of your solution.

In [13]:
# Test locally to ensure an acceptable answer, True or False
try:
    assert(isinstance(TruthValueOfStatement0a, bool)) 
    assert(isinstance(TruthValueOfStatement0b, bool)) 
    assert(isinstance(TruthValueOfStatement0c, bool))
    print("Good, you have answered either True or False. Hopefully they are the correct answers!")
except AssertionError:
    print("Try again. You are not writing True or False for your answers.")
Good, you have answered either True or False. Hopefully they are the correct answers!

Assignment 1, PROBLEM 1

Maximum Points = 1

NonAutoGraded

Summarise in 50-100 words your thoughts after reading https://anatomyof.ai/, "Anatomy of an AI System" By Kate Crawford and Vladan Joler (2018)

You can double-click this cell and start writing your summary below between the two --- lines in English. When you are done just CTRL-Enter (press down the ctrl key and hit the Enter key) to see how it looks in display mode.




Assignment 1, PROBLEM 2

Maximum Points = 2

Finding out the number of lines and characters in a file

Evaluate the following two cells by replacing X with the right command-line option to wc command in order to find:

  1. the number of lines in data/earthquakes_small.csv and
  2. the number of characters in data/earthquakes_small.csv

Finally, update the following cell by replacing XXX with the right integer answers, respectively, for:

  1. NumberOfLinesIn_earthquakes_small_csv_file and
  2. NumberOfCharactersIn_earthquakes_small_csv_file

Here is a brief synopsis of wc that you would get from running man wc as follows:

%%sh
man wc
WC(1)                     BSD General Commands Manual                    WC(1)

NAME
     wc -- word, line, character, and byte count

SYNOPSIS
     wc [-clmw] [file ...]

DESCRIPTION
     The wc utility displays the number of lines, words, and bytes contained in each input file, or standard input (if no file is specified) to the standard output.  A line is defined as a string of characters delimited by a <newline> character.  Characters beyond the final <newline> character will not be included in the line count.

     A word is defined as a string of characters delimited by white space characters.  White space characters are the set of characters for which the iswspace(3) function returns true.  If more than one input file is specified, a line of cumulative counts for all the files is displayed on a separate line after the output for the last file.

     The following options are available:

     -c      The number of bytes in each input file is written to the standard output.  This will cancel out any prior usage of the -m option.

     -l      The number of lines in each input file is written to the standard output.

     -m      The number of characters in each input file is written to the standard output.  If the current locale does not support multibyte
             characters, this is equivalent to the -c option.  This will cancel out any prior usage of the -c option.

     -w      The number of words in each input file is written to the standard output.

     When an option is specified, wc only reports the information requested by that option.  The order of output always takes the form of line, word, byte, and file name.  The default action is equivalent to specifying the -c, -l and -w options.
In [ ]:
%%sh
# replace X in the next line with the right option to find the number of lines
wc -X data/earthquakes_small.csv
In [ ]:
%%sh
# replace X in the next line with the right option to find the number of characters
wc -X data/earthquakes_small.csv
In [11]:
# write your answer below by replacing XXX don't modify anything else! 

NumberOfLinesIn_earthquakes_small_csv_file = XXX
NumberOfCharactersIn_earthquakes_small_csv_file = XXX

Local Test for Assignment 1, PROBLEM 2

Evaluate cell below to make sure your answer is valid. You should not modify anything in the cell below when evaluating it to do a local test of your solution.

In [12]:
# Evaluate this cell locally to make sure you have the answer as a non-negative integer
try:
    assert(NumberOfLinesIn_earthquakes_small_csv_file > -1)
    print("Good! You have 0 or more lines as your answer. Hopefully it is the correct!")
except AssertionError:
    print("Try Again. You seem to not have a valid number of lines as your answer.")
try:
    assert(NumberOfCharactersIn_earthquakes_small_csv_file > -1)
    print("Good! You have 0 or more characters as your answer. Hopefully it is the correct!")
except AssertionError:
    print("Try Again. You seem to not have a valid number of characters as your answer.")
Good! You have 0 or more lines as your answer. Hopefully it is the correct!
Good! You have 0 or more characters as your answer. Hopefully it is the correct!

Assignment 1, PROBLEM 3

Maximum Points = 1

Consider the experiment where we roll two fair dice independently.

Let $D$ be the event that "the sum of the two dice is 8" and let $C$ be the event that "the first die is 2".

What is the probability of D given C, i.e. what is $P(D|C)$?

Do the calculation by hand and write the answer in the next cell by assigning the variable ProbOfDGivenC.

In [6]:
# Replace XXX below with the correct answer to Assignment 1 Problem 3
# Do NOT change the name of the variable ProbOfDGivenC
ProbOfDGivenC = XXX

Local Test for Assignment 1, PROBLEM 3

Evaluate cell below to make sure your answer is valid. You should not modify anything in the cell below when evaluating it to do a local test of your solution.

In [ ]:
# test that your answer is indeed a probability by evaluating this cell after you replaced XXX above and evaluated it.
try:
    assert(ProbOfDGivenC >= 0 and ProbOfDGivenC <= 1)
    print("Your answer is a probability, hopefully it is correct.")
except AssertionError:
    print("Try again! and make sure you are actually producing a valid probability, i.e., a real number in [0,1]")

Assignment 1, PROBLEM 4

Maximum Points = 2

Recall that for a given parameter $\theta \in [0,1]$, the probability mass function (PMF) for the $Bernoulli(\theta)$ RV $X$ is:

$$ \begin{equation} f(x;\theta)= \theta^x (1-\theta)^{1-x} \mathbf{1}_{\{0,1\}}(x) = \begin{cases} \theta & \text{if $x=1$,}\\ 1-\theta & \text{if $x=0$,}\\ 0 & \text{otherwise} \end{cases} \end{equation} $$

In the next cell write a function named pmfOfBernoulli that takes in two arguments:

  • the first argument is x and
  • the second argument is theta

and returns the value for $f(x; \theta)$.

In [ ]:
# Replace RRR...RRR below Do NOT change the name of the function `pmfOfBernoulli`!

def pmfOfBernoulli(x, theta):
    '''RRR ... RRR'''
    RRR
    RRR
    RRR
    ...
    RRR

Local Test for Assignment 1, PROBLEM 4

Evaluate cell below to make sure your answer is valid. You should not modify anything in the cell below when evaluating it to do a local test of your solution.

In [84]:
# Evaluate this to locally test that your solution is returning probabilities
try:
    assert (pmfOfBernoulli(0, 1/2) >=0) and (pmfOfBernoulli(1, 1/2) <=1)
    print("You seem to have a valid probability for your answer. Hopefully it is correct!")
except:
    print("Try again. You don't have a valid probability,\n \
           i.e., a real number in the unit interval [0,1] for your answer")
You seem to have a valid probability for your answer. Hopefully it is correct!