[1]:
%run ../initscript.py
HTML("""
<div id="popup" style="padding-bottom:5px; display:none;">
<div>Enter Password:</div>
<input id="password" type="password"/>
<button onclick="done()" style="border-radius: 12px;">Submit</button>
</div>
<button onclick="unlock()" style="border-radius: 12px;">Unclock</button>
<a href="#" onclick="code_toggle(this); return false;">show code</a>
""")
[1]:
[2]:
%run loadmlfuncs.py
A First Look on Deep Learning¶
There is a set of 60,000 training images, plus 10,000 test images, assembled by the National Institute of Standards and Technology (NIST). Each image is a gray scale 28 \(\times\) 28 pixels handwritten digits. we’re trying to classify images into their 10 categories (0 through 9).
[3]:
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
print('training images:{}, test images:{}'.format(train_images.shape, test_images.shape))
Using TensorFlow backend.
training images:(60000, 28, 28), test images:(10000, 28, 28)
[4]:
def showimg(data, idx):
span = 5
if data=='train':
if idx+span<train_images.shape[0]:
images = train_images
labels = train_labels
else:
print('Index is out of range.')
if data=='test':
if idx+span<test_images.shape[0]:
images = test_images
labels = test_labels
else:
print('Index is out of range.')
plt.figure(figsize=(20,4))
for i in range(span):
plt.subplot(1, 5, i + 1)
digit = images[idx+i]
plt.imshow(digit, cmap=plt.cm.binary)
plt.title('Index:{}, Label:{}'.format(idx+i, labels[idx+i]), fontsize = 15)
plt.show()
interact(showimg,
data = widgets.RadioButtons(options=['train', 'test'],
value='train', description='Data:', disabled=False),
idx = widgets.IntText(value=7, description='Index:', disabled=False));
Network Architecture¶
The core building block of neural networks is the layer, a data-processing module working as a filter for data. Specifically, layers extract representations out of the data fed into them in a more useful form which is often called features.
Most of deep learning consists of chaining together simple layers that will implement a form of progressive data distillation. A deep-learning model is like a sieve for data processing, made of a succession of increasingly refined data filters the layers.
network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))
Here, our network consists of a sequence of two densely connected (fully connected) layers. The second (and last) layer is a 10-way softmax layer, which means it will return an array of 10 probability scores (summing to 1). Each score will be the probability that the current digit image belongs to one of our 10 digit classes.
Compilation¶
Before training the network, we need to perform a compilation step by setting up:
An optimizer: the mechanism to improve its performance on the training data
A loss function: the measurement of its performance on the training data
Metrics to monitor during training and testing
network.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
Data Preparation¶
train_images_reshape = train_images.reshape((60000, 28 * 28))
train_images_reshape = train_images_reshape.astype('float32') / 255
test_images_reshape = test_images.reshape((10000, 28 * 28))
test_images_reshape = test_images_reshape.astype('float32') / 255
train_labels_cat = to_categorical(train_labels)
test_labels_cat = to_categorical(test_labels)
Fitting¶
We train the neural network so that it can classify images in test image set.
network.fit(train_images_reshape, train_labels_cat, epochs=5, batch_size=128)
[5]:
from keras import models
from keras import layers
from keras.utils import to_categorical
network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))
network.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
train_images_reshape = train_images.reshape((60000, 28 * 28))
train_images_reshape = train_images_reshape.astype('float32') / 255
test_images_reshape = test_images.reshape((10000, 28 * 28))
test_images_reshape = test_images_reshape.astype('float32') / 255
train_labels_cat = to_categorical(train_labels)
test_labels_cat = to_categorical(test_labels)
network.fit(train_images_reshape, train_labels_cat, epochs=5, batch_size=128)
test_loss, test_acc = network.evaluate(test_images_reshape, test_labels_cat)
print('test accuracy:', test_acc)
WARNING:tensorflow:From C:\Users\mzhao\AppData\Local\Continuum\anaconda3\envs\bzan\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From C:\Users\mzhao\AppData\Local\Continuum\anaconda3\envs\bzan\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Epoch 1/5
60000/60000 [==============================] - 4s 63us/step - loss: 0.2540 - acc: 0.9247
Epoch 2/5
60000/60000 [==============================] - 4s 63us/step - loss: 0.1055 - acc: 0.9691
Epoch 3/5
60000/60000 [==============================] - 4s 61us/step - loss: 0.0690 - acc: 0.9793
Epoch 4/5
60000/60000 [==============================] - 3s 56us/step - loss: 0.0510 - acc: 0.9847
Epoch 5/5
60000/60000 [==============================] - 3s 54us/step - loss: 0.0375 - acc: 0.9889
10000/10000 [==============================] - 0s 47us/step
test accuracy: 0.9778
We reach an accuracy of 98.9% on the training data. However, the test-set accuracy turns out to be 97.8% — that’s quite a bit lower than the training set accuracy as our errors are doubled. This gap between training accuracy and test accuracy is an example of overfitting.
Prediction Error¶
We demonstrate a few images that are misclassified by the trained neural network.
[6]:
predicted = network.predict_classes(test_images_reshape)
result = abs(predicted - test_labels)
misclassified = np.where(result>0)[0]
print('# of misclassified images:',misclassified.shape[0])
plt.figure(figsize=(20,4))
for i in range(5):
plt.subplot(1, 5, i + 1)
idx = misclassified[i]
digit = test_images[idx]
plt.imshow(digit, cmap=plt.cm.binary)
plt.title('Predicted:{}, Label:{}'.format(predicted[idx], test_labels[idx]), fontsize = 15)
plt.show()
# of misclassified images: 222