ScaDaMaLe Course site and book

This is a 2019-2021 augmentation and update of Adam Breindel's initial notebooks.

Thanks to Christian von Koch and William Anzén for their contributions towards making these materials Spark 3.0.1 and Python 3+ compliant.

TensorFlow

... is a general math framework

TensorFlow is designed to accommodate...

Easy operations on tensors (n-dimensional arrays)
Mappings to performant low-level implementations, including native CPU and GPU
Optimization via gradient descent variants
- Including high-performance differentiation

Low-level math primitives called "Ops"

From these primitives, linear algebra and other higher-level constructs are formed.

Going up one more level common neural-net components have been built and included.

At an even higher level of abstraction, various libraries have been created that simplify building and wiring common network patterns. Over the last 2 years, we've seen 3-5 such libraries.

We will focus later on one, Keras, which has now been adopted as the "official" high-level wrapper for TensorFlow.

We'll get familiar with TensorFlow so that it is not a "magic black box"

But for most of our work, it will be more productive to work with the higher-level wrappers. At the end of this notebook, we'll make the connection between the Keras API we've used and the TensorFlow code underneath.

import tensorflow as tf

x = tf.constant(100, name='x')
y = tf.Variable(x + 50, name='y')

print(y)

WARNING:tensorflow:From /databricks/python/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
<tf.Variable 'y:0' shape=() dtype=int32_ref>

There's a bit of "ceremony" there...

... and ... where's the actual output?

For performance reasons, TensorFlow separates the design of the computation from the actual execution.

TensorFlow programs describe a computation graph -- an abstract DAG of data flow -- that can then be analyzed, optimized, and implemented on a variety of hardware, as well as potentially scheduled across a cluster of separate machines.

Like many query engines and compute graph engines, evaluation is lazy ... so we don't get "real numbers" until we force TensorFlow to run the calculation:

init_node = tf.global_variables_initializer()

with tf.Session() as session:
    session.run(init_node)
    print(session.run(y))

TensorFlow integrates tightly with NumPy

and we typically use NumPy to create and manage the tensors (vectors, matrices, etc.) that will "flow" through our graph

New to NumPy? Grab a cheat sheet: https://s3.amazonaws.com/assets.datacamp.com/blogassets/NumpyPythonCheatSheet.pdf

import numpy as np

data = np.random.normal(loc=10.0, scale=2.0, size=[3,3]) # mean 10, std dev 2

print(data)

[[11.39104009 10.74646715  9.97434901]
 [10.85377817 10.18047268  8.54517234]
 [ 7.50991424  8.92290223  7.32137675]]

# all nodes get added to default graph (unless we specify otherwise)
# we can reset the default graph -- so it's not cluttered up:
tf.reset_default_graph()

x = tf.constant(data, name='x')
y = tf.Variable(x * 10, name='y')

init_node = tf.global_variables_initializer()

with tf.Session() as session:
    session.run(init_node)
    print(session.run(y))

[[113.91040088 107.46467153  99.74349013]
 [108.53778169 101.80472682  85.45172335]
 [ 75.09914241  89.22902232  73.21376755]]

We will often iterate on a calculation ...

Calling session.run runs just one step, so we can iterate using Python as a control:

with tf.Session() as session:
    for i in range(3):
        x = x + 1
        print(session.run(x))
        print("----------------------------------------------")

[[12.39104009 11.74646715 10.97434901]
 [11.85377817 11.18047268  9.54517234]
 [ 8.50991424  9.92290223  8.32137675]]
----------------------------------------------
[[13.39104009 12.74646715 11.97434901]
 [12.85377817 12.18047268 10.54517234]
 [ 9.50991424 10.92290223  9.32137675]]
----------------------------------------------
[[14.39104009 13.74646715 12.97434901]
 [13.85377817 13.18047268 11.54517234]
 [10.50991424 11.92290223 10.32137675]]
----------------------------------------------

Optimizers

TF includes a set of built-in algorithm implementations (though you could certainly write them yourself) for performing optimization.

These are oriented around gradient-descent methods, with a set of handy extension flavors to make things converge faster.

Using TF optimizer to solve problems

We can use the optimizers to solve anything (not just neural networks) so let's start with a simple equation.

We supply a bunch of data points, that represent inputs. We will generate them based on a known, simple equation (y will always be 2*x + 6) but we won't tell TF that. Instead, we will give TF a function structure ... linear with 2 parameters, and let TF try to figure out the parameters by minimizing an error function.

What is the error function?

The "real" error is the absolute value of the difference between TF's current approximation and our ground-truth y value.

But absolute value is not a friendly function to work with there, so instead we'll square it. That gets us a nice, smooth function that TF can work with, and it's just as good:

np.random.rand()

x = tf.placeholder("float") 
y = tf.placeholder("float")

m = tf.Variable([1.0], name="m-slope-coefficient") # initial values ... for now they don't matter much
b = tf.Variable([1.0], name="b-intercept")

y_model = tf.multiply(x, m) + b

error = tf.square(y - y_model)

train_op = tf.train.GradientDescentOptimizer(0.01).minimize(error)

model = tf.global_variables_initializer()

with tf.Session() as session:
    session.run(model)
    for i in range(10):
        x_value = np.random.rand()
        y_value = x_value * 2 + 6 # we know these params, but we're making TF learn them
        session.run(train_op, feed_dict={x: x_value, y: y_value})

    out = session.run([m, b])
    print(out)
    print("Model: {r:.3f}x + {s:.3f}".format(r=out[0][0], s=out[1][0]))

[array([1.6890278], dtype=float32), array([1.9976655], dtype=float32)]
Model: 1.689x + 1.998

That's pretty terrible :)

Try two experiments. Change the number of iterations the optimizer runs, and -- independently -- try changing the learning rate (that's the number we passed to GradientDescentOptimizer)

See what happens with different values.

These are scalars. Where do the tensors come in?

Using matrices allows us to represent (and, with the right hardware, compute) the data-weight dot products for lots of data vectors (a mini batch) and lots of weight vectors (neurons) at the same time.

Tensors are useful because some of our data "vectors" are really multidimensional -- for example, with a color image we may want to preserve height, width, and color planes. We can hold multiple color images, with their shapes, in a 4-D (or 4 "axis") tensor.

Let's also make the connection from Keras down to Tensorflow.

We used a Keras class called Dense, which represents a "fully-connected" layer of -- in this case -- linear perceptrons. Let's look at the source code to that, just to see that there's no mystery.

https://github.com/fchollet/keras/blob/master/keras/layers/core.py

It calls down to the "back end" by calling output = K.dot(inputs, self.kernel) where kernel means this layer's weights.

K represents the pluggable backend wrapper. You can trace K.dot on Tensorflow by looking at

https://github.com/fchollet/keras/blob/master/keras/backend/tensorflow_backend.py

Look for def dot(x, y): and look right toward the end of the method. The math is done by calling tf.matmul(x, y)

What else helps Tensorflow (and other frameworks) run fast?

A fast, simple mechanism for calculating all of the partial derivatives we need, called reverse-mode autodifferentiation
Implementations of low-level operations in optimized CPU code (e.g., C++, MKL) and GPU code (CUDA/CuDNN/HLSL)
Support for distributed parallel training, although parallelizing deep learning is non-trivial ... not automagic like with, e.g., Apache Spark

That is the essence of TensorFlow!

There are three principal directions to explore further:

Working with tensors instead of scalars: this is not intellectually difficult, but takes some practice to wrangle the shaping and re-shaping of tensors. If you get the shape of a tensor wrong, your script will blow up. Just takes practice.
Building more complex models. You can write these yourself using lower level "Ops" -- like matrix multiply -- or using higher level classes like tf.layers.dense Use the source, Luke!
Operations and integration ecosystem: as TensorFlow has matured, it is easier to integrate additional tools and solve the peripheral problems:
- TensorBoard for visualizing training
- tfdbg command-line debugger
- Distributed TensorFlow for clustered training
- GPU integration
- Feeding large datasets from external files
- Tensorflow Serving for serving models (i.e., using an existing model to predict on new incoming data)

Distirbuted Training for DL

https://docs.azuredatabricks.net/applications/deep-learning/distributed-training/horovod-runner.html
- https://docs.azuredatabricks.net/applications/deep-learning/distributed-training/mnist-tensorflow-keras.html
https://software.intel.com/en-us/articles/bigdl-distributed-deep-learning-on-apache-spark

sds-3.x/ScaDaMaLe