057_DLbyABr_04b-Hands-On-MNIST-CNN(Python)

SDS-2.x, Scalable Data Engineering Science

This is a 2019 augmentation and update of Adam Breindel's initial notebooks.

Try out the first ConvNet -- the one we looked at earlier.

This code is the same, but we'll run to 20 epochs so we can get a better feel for fitting/validation/overfitting trend.

from keras.utils import to_categorical
import sklearn.datasets

train_libsvm = "/dbfs/databricks-datasets/mnist-digits/data-001/mnist-digits-train.txt"
test_libsvm = "/dbfs/databricks-datasets/mnist-digits/data-001/mnist-digits-test.txt"

X_train, y_train = sklearn.datasets.load_svmlight_file(train_libsvm, n_features=784)
X_train = X_train.toarray()

X_test, y_test = sklearn.datasets.load_svmlight_file(test_libsvm, n_features=784)
X_test = X_test.toarray()

X_train = X_train.reshape( (X_train.shape[0], 28, 28, 1) )
X_train = X_train.astype('float32')
X_train /= 255
y_train = to_categorical(y_train, num_classes=10)

X_test = X_test.reshape( (X_test.shape[0], 28, 28, 1) )
X_test = X_test.astype('float32')
X_test /= 255
y_test = to_categorical(y_test, num_classes=10)
Using TensorFlow backend.
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D

model = Sequential()

model.add(Conv2D(8, # number of kernels 
                (4, 4), # kernel size
                padding='valid', # no padding; output will be smaller than input
                input_shape=(28, 28, 1)))

model.add(Activation('relu'))

model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu')) # alternative syntax for applying activation

model.add(Dense(10))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

history = model.fit(X_train, y_train, batch_size=128, epochs=20, verbose=2, validation_split=0.1)

scores = model.evaluate(X_test, y_test, verbose=1)

print
for i in range(len(model.metrics_names)):
    print("%s: %f" % (model.metrics_names[i], scores[i]))
WARNING:tensorflow:From /databricks/python/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /databricks/python/local/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. Train on 54000 samples, validate on 6000 samples Epoch 1/20 - 11s - loss: 0.3233 - acc: 0.9068 - val_loss: 0.1206 - val_acc: 0.9653 Epoch 2/20 - 10s - loss: 0.1111 - acc: 0.9673 - val_loss: 0.0753 - val_acc: 0.9808 Epoch 3/20 - 10s - loss: 0.0687 - acc: 0.9791 - val_loss: 0.0625 - val_acc: 0.9818 Epoch 4/20 - 10s - loss: 0.0488 - acc: 0.9855 - val_loss: 0.0521 - val_acc: 0.9855 Epoch 5/20 - 10s - loss: 0.0385 - acc: 0.9883 - val_loss: 0.0507 - val_acc: 0.9858 Epoch 6/20 - 10s - loss: 0.0311 - acc: 0.9903 - val_loss: 0.0485 - val_acc: 0.9862 Epoch 7/20 - 10s - loss: 0.0257 - acc: 0.9920 - val_loss: 0.0486 - val_acc: 0.9880 Epoch 8/20 - 10s - loss: 0.0210 - acc: 0.9934 - val_loss: 0.0462 - val_acc: 0.9875 Epoch 9/20 - 10s - loss: 0.0173 - acc: 0.9948 - val_loss: 0.0500 - val_acc: 0.9865 Epoch 10/20 - 10s - loss: 0.0150 - acc: 0.9954 - val_loss: 0.0471 - val_acc: 0.9878 Epoch 11/20 - 10s - loss: 0.0116 - acc: 0.9966 - val_loss: 0.0608 - val_acc: 0.9840 Epoch 12/20 - 10s - loss: 0.0106 - acc: 0.9968 - val_loss: 0.0459 - val_acc: 0.9885 Epoch 13/20 - 10s - loss: 0.0098 - acc: 0.9970 - val_loss: 0.0511 - val_acc: 0.9873 Epoch 14/20 - 10s - loss: 0.0080 - acc: 0.9976 - val_loss: 0.0586 - val_acc: 0.9857 Epoch 15/20 - 10s - loss: 0.0074 - acc: 0.9979 - val_loss: 0.0495 - val_acc: 0.9890 Epoch 16/20 - 10s - loss: 0.0052 - acc: 0.9984 - val_loss: 0.0545 - val_acc: 0.9890 Epoch 17/20 - 10s - loss: 0.0068 - acc: 0.9977 - val_loss: 0.0591 - val_acc: 0.9860 Epoch 18/20 - 10s - loss: 0.0076 - acc: 0.9974 - val_loss: 0.0567 - val_acc: 0.9890 Epoch 19/20 - 10s - loss: 0.0038 - acc: 0.9990 - val_loss: 0.0528 - val_acc: 0.9885 Epoch 20/20 - 10s - loss: 0.0029 - acc: 0.9992 - val_loss: 0.0629 - val_acc: 0.9865 32/10000 [..............................] - ETA: 1s 416/10000 [>.............................] - ETA: 1s 800/10000 [=>............................] - ETA: 1s 1184/10000 [==>...........................] - ETA: 1s 1600/10000 [===>..........................] - ETA: 1s 2016/10000 [=====>........................] - ETA: 1s 2432/10000 [======>.......................] - ETA: 0s 2848/10000 [=======>......................] - ETA: 0s 3232/10000 [========>.....................] - ETA: 0s 3584/10000 [=========>....................] - ETA: 0s 4000/10000 [===========>..................] - ETA: 0s 4416/10000 [============>.................] - ETA: 0s 4832/10000 [=============>................] - ETA: 0s 5216/10000 [==============>...............] - ETA: 0s 5600/10000 [===============>..............] - ETA: 0s 5984/10000 [================>.............] - ETA: 0s 6400/10000 [==================>...........] - ETA: 0s 6816/10000 [===================>..........] - ETA: 0s 7232/10000 [====================>.........] - ETA: 0s 7648/10000 [=====================>........] - ETA: 0s 8064/10000 [=======================>......] - ETA: 0s 8480/10000 [========================>.....] - ETA: 0s 8864/10000 [=========================>....] - ETA: 0s 9248/10000 [==========================>...] - ETA: 0s 9664/10000 [===========================>..] - ETA: 0s 10000/10000 [==============================] - 1s 130us/step loss: 0.052445 acc: 0.986500
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
fig.set_size_inches((5,5))
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
display(fig)

Next, let's try adding another convolutional layer:

model = Sequential()

model.add(Conv2D(8, # number of kernels 
                        (4, 4), # kernel size
                        padding='valid',
                        input_shape=(28, 28, 1)))

model.add(Activation('relu'))

model.add(Conv2D(8, (4, 4))) # <-- additional Conv layer
model.add(Activation('relu'))

model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))

model.add(Dense(10))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

history = model.fit(X_train, y_train, batch_size=128, epochs=15, verbose=2, validation_split=0.1)

scores = model.evaluate(X_test, y_test, verbose=1)

print
for i in range(len(model.metrics_names)):
    print("%s: %f" % (model.metrics_names[i], scores[i]))
Train on 54000 samples, validate on 6000 samples Epoch 1/15 - 20s - loss: 0.2958 - acc: 0.9148 - val_loss: 0.0759 - val_acc: 0.9790 Epoch 2/15 - 20s - loss: 0.0710 - acc: 0.9784 - val_loss: 0.0542 - val_acc: 0.9863 Epoch 3/15 - 20s - loss: 0.0478 - acc: 0.9854 - val_loss: 0.0474 - val_acc: 0.9870 Epoch 4/15 - 20s - loss: 0.0351 - acc: 0.9891 - val_loss: 0.0421 - val_acc: 0.9890 Epoch 5/15 - 20s - loss: 0.0270 - acc: 0.9917 - val_loss: 0.0524 - val_acc: 0.9857 Epoch 6/15 - 20s - loss: 0.0213 - acc: 0.9933 - val_loss: 0.0434 - val_acc: 0.9883 Epoch 7/15 - 20s - loss: 0.0184 - acc: 0.9942 - val_loss: 0.0447 - val_acc: 0.9888 Epoch 8/15 - 20s - loss: 0.0132 - acc: 0.9959 - val_loss: 0.0529 - val_acc: 0.9867 Epoch 9/15 - 20s - loss: 0.0111 - acc: 0.9964 - val_loss: 0.0481 - val_acc: 0.9882 Epoch 10/15 - 20s - loss: 0.0099 - acc: 0.9967 - val_loss: 0.0469 - val_acc: 0.9887 Epoch 11/15 - 20s - loss: 0.0080 - acc: 0.9971 - val_loss: 0.0697 - val_acc: 0.9843 Epoch 12/15 - 20s - loss: 0.0074 - acc: 0.9976 - val_loss: 0.0468 - val_acc: 0.9897 Epoch 13/15 - 20s - loss: 0.0052 - acc: 0.9982 - val_loss: 0.0526 - val_acc: 0.9893 Epoch 14/15 - 20s - loss: 0.0071 - acc: 0.9978 - val_loss: 0.0532 - val_acc: 0.9892 Epoch 15/15 - 19s - loss: 0.0059 - acc: 0.9979 - val_loss: 0.0510 - val_acc: 0.9902 32/10000 [..............................] - ETA: 1s 352/10000 [>.............................] - ETA: 1s 672/10000 [=>............................] - ETA: 1s 992/10000 [=>............................] - ETA: 1s 1312/10000 [==>...........................] - ETA: 1s 1632/10000 [===>..........................] - ETA: 1s 1952/10000 [====>.........................] - ETA: 1s 2272/10000 [=====>........................] - ETA: 1s 2592/10000 [======>.......................] - ETA: 1s 2880/10000 [=======>......................] - ETA: 1s 3200/10000 [========>.....................] - ETA: 1s 3520/10000 [=========>....................] - ETA: 1s 3840/10000 [==========>...................] - ETA: 1s 4160/10000 [===========>..................] - ETA: 0s 4480/10000 [============>.................] - ETA: 0s 4800/10000 [=============>................] - ETA: 0s 5120/10000 [==============>...............] - ETA: 0s 5440/10000 [===============>..............] - ETA: 0s 5760/10000 [================>.............] - ETA: 0s 6080/10000 [=================>............] - ETA: 0s 6400/10000 [==================>...........] - ETA: 0s 6688/10000 [===================>..........] - ETA: 0s 7008/10000 [====================>.........] - ETA: 0s 7328/10000 [====================>.........] - ETA: 0s 7648/10000 [=====================>........] - ETA: 0s 7968/10000 [======================>.......] - ETA: 0s 8288/10000 [=======================>......] - ETA: 0s 8608/10000 [========================>.....] - ETA: 0s 8896/10000 [=========================>....] - ETA: 0s 9216/10000 [==========================>...] - ETA: 0s 9536/10000 [===========================>..] - ETA: 0s 9856/10000 [============================>.] - ETA: 0s 10000/10000 [==============================] - 2s 168us/step loss: 0.047584 acc: 0.988000

Still Overfitting

We're making progress on our test error -- about 99% -- but just a bit for all the additional time, due to the network overfitting the data.

There are a variety of techniques we can take to counter this -- forms of regularization.

Let's try a relatively simple solution that works surprisingly well: add a pair of Dropout filters, a layer that randomly omits a fraction of neurons from each training batch (thus exposing each neuron to only part of the training data).

We'll add more convolution kernels but shrink them to 3x3 as well.

model = Sequential()

model.add(Conv2D(32, # number of kernels 
                        (3, 3), # kernel size
                        padding='valid',
                        input_shape=(28, 28, 1)))

model.add(Activation('relu'))

model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))

model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))

model.add(Dropout(0.5))
model.add(Dense(10))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(X_train, y_train, batch_size=128, epochs=15, verbose=2)

scores = model.evaluate(X_test, y_test, verbose=2)

print
for i in range(len(model.metrics_names)):
    print("%s: %f" % (model.metrics_names[i], scores[i]))
WARNING:tensorflow:From /databricks/python/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`. Epoch 1/15 - 66s - loss: 0.2854 - acc: 0.9140 Epoch 2/15 - 66s - loss: 0.0945 - acc: 0.9715 Epoch 3/15 - 66s - loss: 0.0727 - acc: 0.9783 Epoch 4/15 - 65s - loss: 0.0592 - acc: 0.9817 Epoch 5/15 - 66s - loss: 0.0501 - acc: 0.9840 Epoch 6/15 - 66s - loss: 0.0463 - acc: 0.9855 Epoch 7/15 - 66s - loss: 0.0420 - acc: 0.9865 Epoch 8/15 - 66s - loss: 0.0373 - acc: 0.9881 Epoch 9/15 - 65s - loss: 0.0346 - acc: 0.9893 Epoch 10/15 - 66s - loss: 0.0337 - acc: 0.9893 Epoch 11/15 - 66s - loss: 0.0305 - acc: 0.9901 Epoch 12/15 - 66s - loss: 0.0253 - acc: 0.9920 Epoch 13/15 - 66s - loss: 0.0246 - acc: 0.9918 Epoch 14/15 - 66s - loss: 0.0241 - acc: 0.9919 Epoch 15/15 - 65s - loss: 0.0244 - acc: 0.9923 loss: 0.027750 acc: 0.991800

Lab Wrapup

From the last lab, you should have a test accuracy of over 99.1%

For one more activity, try changing the optimizer to old-school "sgd" -- just to see how far we've come with these modern gradient descent techniques in the last few years.

Accuracy will end up noticeably worse ... about 96-97% test accuracy. Two key takeaways:

  • Without a good optimizer, even a very powerful network design may not achieve results
  • In fact, we could replace the word "optimizer" there with
    • initialization
    • activation
    • regularization
    • (etc.)
  • All of these elements we've been working with operate together in a complex way to determine final performance