After finishing theIris classifier, and part of link title Andrew Ng’s ML course, in which he talks about creating a Neural Network classifier to classify hand written digits I decided to try and write my own hand written digit classifier using Keras.
Git
https://github.com/Tzeny/mnist-classfier
Dataset
To train and test the model I used the MNIST dataset. If you are using Keras you can automatically download and import it1:
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
I used the above method.
Convolutional version
Unfortunately I did not use the same random seed for these experiments so take the results here with a grain of salt :(
Architecture | Accuracy on test set |
---|---|
2 conv layers w dropout | 98.66%, 98.71% |
2 conv layers w dropout w 2 dense | 98.67 % |
2 conv layers w/o dropout | 98.60%, 98.59 % |
1 conv layer w dropout | 97.65 % |
4 conv layer w dropout | 98.8 % , 98.36 % |
3 conv layer w dropout | 98.95 % , 98.99 % |
3 conv layer w dropout w 2 dense rmsprop | 98.82 % , 98.72 % |
3 conv layer w dropout w 2 dense sgd | 99 % |
3 conv layer w/o dropout w 2 dense sgd | 99.0 % |
3 conv layer dense drop dense drop dense sgd | 99.11 %, 99.0 %, 99.11 % |
4 conv layer dense drop dense drop dense sgd | 99.14 %, 98.92 % |
4 conv layer dense drop dense drop dense sgd (with activations right after conv layers) | 99.36 %, 99.28 % |
4 conv w act dsn dr dsn dr dsn sgd (w batch normalization) | 99.58 %, 99.39 %, 99.33 % |
4 conv w act dsn dr dsn dr dsn dr sgd (w batch normalization) | 99.35 %, 98.94 % |
4 conv w act dsn dr dsn dr dsn dr sgd (w batch normalization + l2 regularization) | 97.34 %, 98.68 % |
4 conv w act dsn dr dsn dr dsn dr sgd (w batch normalization + l1 regularization) | 87.51 %, 79.44 % |
Source code for CNN version: https://tzeny.ddns.net:4430/Tzeny/udemy-zero-to-deep-learning/blob/master/course/MnistConv.ipynb
Preprocessing the dataset
When you import the dataset, x_train is a 3D matrix of shape (60000,28,28) and x_test is a 3D matrix of shape (10000,28,28). I reshaped them into 2D matrices of shapes (60000, 784) and (10000, 784) respectively.
Next we need to take the labels in the y_train and y_test vectors and map them to binary class matrices. For this we will use the to_categorial function from Keras2:
#dataset:
# x_train 60000 28x28 uint8_encoded images
# y_train 60000 [0-9] labels
# x_test 10000 28x28 uint_8 encoded images
# y_test 60000 [0-9] labels
#reshape our dataset to make it processable by the NN
x_train = np.reshape(x_train, (60000,784))
x_test = np.reshape(x_test, (10000, 784))
#create a vector with 10 rows for each value in y_train in y_test, to be used by our activation layer
y_train_labels = keras.utils.to_categorical(y_train, num_classes=10)
y_test_labels = keras.utils.to_categorical(y_test, num_classes=10)
Full code
import matplotlib.pyplot as plt
import numpy as np
import keras
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.utils import plot_model
from keras.datasets import mnist
import time
from datetime import datetime
max_epoch=50
optimizer='adam'
dropout_ratio = 0.2
neurons_in_layers = 64
#will be used to give all
date = time.strftime('%Y-%m-%d-%H:%M')
#load dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
#dataset:
# x_train 60000 28x28 uint8_encoded images
# y_train 60000 [0-9] labels
# x_test 10000 28x28 uint_8 encoded images
# y_test 60000 [0-9] labels
#reshape our dataset to make it processable by the NN
x_train = np.reshape(x_train, (60000,784))
x_test = np.reshape(x_test, (10000, 784))
#create a vector with 10 rows for each value in y_train in y_test, to be used by our activation layer
y_train_labels = keras.utils.to_categorical(y_train, num_classes=10)
y_test_labels = keras.utils.to_categorical(y_test, num_classes=10)
model = Sequential([
Dense(neurons_in_layers, input_shape=(784,)),
Activation('relu'),
Dense(neurons_in_layers),
Activation('relu'),
Dense(neurons_in_layers),
Activation('relu'),
Dense(neurons_in_layers),
Activation('relu'),
Dense(neurons_in_layers),
Activation('relu'),
Dropout(dropout_ratio),
Dense(10),
Activation('softmax'),
])
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
plot_model(model, to_file="models-png/" + date + '_model.png', show_shapes=True)
history = model.fit(x_train, y_train_labels, validation_data=(x_test, y_test_labels), epochs=max_epoch, verbose=1)
#save model
model.save_weights("models/" + optimizer + "_opt-" + str(max_epoch) + "_eps-" + date + ".h5")
print("Saved model to disk")
#plot data
# list all data in history
print(history.history.keys())
# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
# plot 3 so that we can better see our model's accuracy
plt.axhline(y=1, color='g', linestyle='--')
plt.axhline(y=0.95, color='orange', linestyle='--')
plt.axhline(y=0.9, color='r', linestyle='--')
plt.title('model - accuracy and loss')
plt.ylabel('accuracy/loss')
plt.xlabel('epoch')
plt.legend(['train_acc', 'test_acc', 'train_loss', 'val_loss'], loc='upper left')
plt.savefig(optimizer + "_opt-" + str(max_epoch) + "_eps-" + date +'_score.png', bbox_inches='tight')
print("Saved plot to disk")
Displaying the result
In order to display the result I took an online piece of code3 and modified it to plot training accuracy, test set accuracy, training loss and test loss and then save them to a file.
Model used to get above results
Try other architectures
The problem with this is that it has started overfitting the test set.