Digit Sequence Recognition using Deep Learning
Note: This project is outdated. There are better models that we can build inline with the current standards. While I will be looking top update this soon, I don't recommend using this code in your current projects related for image recognition with deep learning.
In this project, we will design and implement a deep learning model that learns to recognize sequences of digits. We will train the model using synthetic data generated by concatenating character images from MNIST.
To produce a synthetic sequence of digits for testing, we will limit the to sequences to up to five digits, and use five classifiers on top of your deep network. We will incorporate an additional ‘blank’ character to account for shorter number sequences.
We will use Keras to implement the model. You can read more about Keras at keras.io.
Implementation
Let's start by importing the modules we'll require fot this project.
1#Module Imports2from __future__ import print_function3import random4from os import listdir5import glob6
7import numpy as np8from scipy import misc9import tensorflow as tf10import h5py11
12from keras.datasets import mnist13from keras.utils import np_utils14
15import matplotlib.pyplot as plt16%matplotlib inline17
18#Setting the random seed so that the results are reproducible. 19random.seed(101)20
21#Setting variables for MNIST image dimensions22mnist_image_height = 2823mnist_image_width = 2824
25#Import MNIST data from keras26(X_train, y_train), (X_test, y_test) = mnist.load_data()27
28#Checking the downloaded data29print("Shape of training dataset: {}".format(np.shape(X_train)))30print("Shape of test dataset: {}".format(np.shape(X_test)))31
32
33plt.figure()34plt.imshow(X_train[0], cmap='gray')35
36print("Label for image: {}".format(y_train[0]))
Shape of training dataset: (60000, 28, 28)Shape of test dataset: (10000, 28, 28)Label for image: 5
Building synthetic data
The MNIST dataset is very popular for beginner Deep Learning projects. So, to add a twist to the tale, we're going to predict images that can contain 1 to 5 digits. We'll have to change the architecture of our deep learning model for this, but before that, we'll need to generate this dataset first.
To generate the synthetic training data, we will first start by randomly picking out up to 5 individual digits out from the MNIST training set. The individual images will be then stacked together, and blanks will be used to make up the number of digits if there were less than 5. By this approach, we could increase the size of our training data. We'll build around 60,000 such examples.
While concatenating images together, we'll also build the labels for each image. First, labels for single digits will be arranged in tuples of 5. Labels 0-9 will be used for digits 0-9, and a 10 will be used to indicate a blank.
The same approach will be used to build the test data, but using the MNIST test set for individual digits, for 10,000 synthetic test images.
Let's write a function that does this.
1def build_synth_data(data,labels,dataset_size):2 3 #Define synthetic image dimensions4 synth_img_height = 645 synth_img_width = 646 7 #Define synthetic data8 synth_data = np.ndarray(shape=(dataset_size,synth_img_height,synth_img_width),9 dtype=np.float32)10 11 #Define synthetic labels12 synth_labels = [] 13 14 #For a loop till the size of the synthetic dataset15 for i in range(0,dataset_size):16 17 #Pick a random number of digits to be in the dataset18 num_digits = random.randint(1,5)19 20 #Randomly sampling indices to extract digits + labels afterwards21 s_indices = [random.randint(0,len(data)-1) for p in range(0,num_digits)]22 23 #stitch images together24 new_image = np.hstack([X_train[index] for index in s_indices])25 #stitch the labels together26 new_label = [y_train[index] for index in s_indices]27 28 29 #Loop till number of digits - 5, to concatenate blanks images, and blank labels together30 for j in range(0,5-num_digits):31 new_image = np.hstack([new_image,np.zeros(shape=(mnist_image_height,32 mnist_image_width))])33 new_label.append(10) #Might need to remove this step34 35 #Resize image36 new_image = misc.imresize(new_image,(64,64))37 38 #Assign the image to synth_data39 synth_data[i,:,:] = new_image40 41 #Assign the label to synth_data42 synth_labels.append(tuple(new_label))43 44 45 #Return the synthetic dataset46 return synth_data,synth_labels47
48#Building the training dataset49X_synth_train,y_synth_train = build_synth_data(X_train,y_train,60000)50
51#Building the test dataset52X_synth_test,y_synth_test = build_synth_data(X_test,y_test,10000)53
54#checking a sample55plt.figure()56plt.imshow(X_synth_train[232], cmap='gray')57
58y_synth_train[232]
(7, 1, 9, 7, 10)
Looks like things work as we expect them to. Let's prepare the datset and labels so that keras can handle them.
Preparatory Preprocessing
Preprocessing Labels for model
The labels are going to be encoded to "One Hot" arrays, to make them compatible with Keras. Note that, as the our Deep Learning model will have 5 classifiers, we'll need 5 such One Hot arrays, one for each digit position in the image.
1#Converting labels to One-hot representations of shape (set_size,digits,classes)2possible_classes = 113
4def convert_labels(labels):5 6 #As per Keras conventions, the multiple labels need to be of the form [array_digit1,...5]7 #Each digit array will be of shape (60000,11)8 9 #Code below could be better, but cba for now. 10 11 #Declare output ndarrays12 dig0_arr = np.ndarray(shape=(len(labels),possible_classes))13 dig1_arr = np.ndarray(shape=(len(labels),possible_classes))14 dig2_arr = np.ndarray(shape=(len(labels),possible_classes))15 dig3_arr = np.ndarray(shape=(len(labels),possible_classes)) #5 for digits, 11 for possible classes 16 dig4_arr = np.ndarray(shape=(len(labels),possible_classes))17 18 for index,label in enumerate(labels):19 20 #Using np_utils from keras to OHE the labels in the image21 dig0_arr[index,:] = np_utils.to_categorical(label[0],possible_classes)22 dig1_arr[index,:] = np_utils.to_categorical(label[1],possible_classes)23 dig2_arr[index,:] = np_utils.to_categorical(label[2],possible_classes)24 dig3_arr[index,:] = np_utils.to_categorical(label[3],possible_classes)25 dig4_arr[index,:] = np_utils.to_categorical(label[4],possible_classes)26 27 return [dig0_arr,dig1_arr,dig2_arr,dig3_arr,dig4_arr]28
29train_labels = convert_labels(y_synth_train)30test_labels = convert_labels(y_synth_test)31
32#Checking the shape of the OHE array for the first digit position33np.shape(train_labels[0])
(60000, 11)
1np_utils.to_categorical(y_synth_train[234][0],11)
array([[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.]])
Preprocessing Images for model
The function below will pre-process the images so that they can be handled by keras.
1def prep_data_keras(img_data):2 3 #Reshaping data for keras, with tensorflow as backend4 img_data = img_data.reshape(len(img_data),64,64,1)5 6 #Converting everything to floats7 img_data = img_data.astype('float32')8 9 #Normalizing values between 0 and 110 img_data /= 25511 12 return img_data13
14train_images = prep_data_keras(X_synth_train)15test_images = prep_data_keras(X_synth_test)16
17np.shape(train_images)
(60000, 64, 64, 1)
1np.shape(test_images)
(10000, 64, 64, 1)
Model Building
1#Importing relevant keras modules2from keras.models import Sequential, Model3from keras.layers import Dense, Dropout, Activation, Flatten, Input4from keras.layers import Convolution2D, MaxPooling2D
We're going to use a Convolutional Neural Network for our network.
Starting with a 2D Convolutional layer, we'll use ReLU activations after every Convolutional Layer.
After the second CovLayer + ReLU, we'll add 2DMaxPooling, and a dropout to make the model robust to overfitting. A flattening layer will be added to make the data ready for classification layers, which were in the form of Dense Layers, of the same size as the no. of classes (11 for us), activated using softmax to give us the probability of each class.
1#Building the model2
3batch_size = 1284nb_classes = 115nb_epoch = 126
7#image input dimensions8img_rows = 649img_cols = 6410img_channels = 111
12#number of convulation filters to use13nb_filters = 3214# size of pooling area for max pooling15pool_size = (2, 2)16# convolution kernel size17kernel_size = (3, 3)18
19#defining the input20inputs = Input(shape=(img_rows,img_cols,img_channels))21
22#Model taken from keras example. Worked well for a digit, dunno for multiple23cov = Convolution2D(nb_filters,kernel_size[0],kernel_size[1],border_mode='same')(inputs)24cov = Activation('relu')(cov)25cov = Convolution2D(nb_filters,kernel_size[0],kernel_size[1])(cov)26cov = Activation('relu')(cov)27cov = MaxPooling2D(pool_size=pool_size)(cov)28cov = Dropout(0.25)(cov)29cov_out = Flatten()(cov)30
31
32#Dense Layers33cov2 = Dense(128, activation='relu')(cov_out)34cov2 = Dropout(0.5)(cov2)35
36
37
38#Prediction layers39c0 = Dense(nb_classes, activation='softmax')(cov2)40c1 = Dense(nb_classes, activation='softmax')(cov2)41c2 = Dense(nb_classes, activation='softmax')(cov2)42c3 = Dense(nb_classes, activation='softmax')(cov2)43c4 = Dense(nb_classes, activation='softmax')(cov2)44
45#Defining the model46model = Model(input=inputs,output=[c0,c1,c2,c3,c4])47
48#Compiling the model49model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])50
51#Fitting the model52model.fit(train_images,train_labels,batch_size=batch_size,nb_epoch=nb_epoch,verbose=1,53 validation_data=(test_images, test_labels))
Train on 60000 samples, validate on 10000 samplesEpoch 1/1260000/60000 [==============================] - 944s - loss: 3.7553 - dense_4_loss: 0.9496 - dense_5_loss: 0.9061 - dense_6_loss: 0.7953 - dense_7_loss: 0.6590 - dense_8_loss: 0.4454 - dense_4_acc: 0.6825 - dense_5_acc: 0.7028 - dense_6_acc: 0.7378 - dense_7_acc: 0.7884 - dense_8_acc: 0.8626 - val_loss: 1.0372 - val_dense_4_loss: 0.2358 - val_dense_5_loss: 0.2332 - val_dense_6_loss: 0.2181 - val_dense_7_loss: 0.1906 - val_dense_8_loss: 0.1596 - val_dense_4_acc: 0.9471 - val_dense_5_acc: 0.9511 - val_dense_6_acc: 0.9551 - val_dense_7_acc: 0.9600 - val_dense_8_acc: 0.9570Epoch 2/1260000/60000 [==============================] - 923s - loss: 2.3002 - dense_4_loss: 0.5586 - dense_5_loss: 0.5510 - dense_6_loss: 0.4836 - dense_7_loss: 0.4172 - dense_8_loss: 0.2899 - dense_4_acc: 0.8151 - dense_5_acc: 0.8167 - dense_6_acc: 0.8378 - dense_7_acc: 0.8628 - dense_8_acc: 0.9041 - val_loss: 0.6792 - val_dense_4_loss: 0.1599 - val_dense_5_loss: 0.1576 - val_dense_6_loss: 0.1411 - val_dense_7_loss: 0.1201 - val_dense_8_loss: 0.1005 - val_dense_4_acc: 0.9607 - val_dense_5_acc: 0.9678 - val_dense_6_acc: 0.9688 - val_dense_7_acc: 0.9757 - val_dense_8_acc: 0.9795......Epoch 11/1260000/60000 [==============================] - 989s - loss: 0.9621 - dense_4_loss: 0.2213 - dense_5_loss: 0.2163 - dense_6_loss: 0.2024 - dense_7_loss: 0.1902 - dense_8_loss: 0.1319 - dense_4_acc: 0.9194 - dense_5_acc: 0.9226 - dense_6_acc: 0.9249 - dense_7_acc: 0.9315 - dense_8_acc: 0.9522 - val_loss: 0.2832 - val_dense_4_loss: 0.0675 - val_dense_5_loss: 0.0698 - val_dense_6_loss: 0.0516 - val_dense_7_loss: 0.0445 - val_dense_8_loss: 0.0498 - val_dense_4_acc: 0.9801 - val_dense_5_acc: 0.9806 - val_dense_6_acc: 0.9847 - val_dense_7_acc: 0.9871 - val_dense_8_acc: 0.9885Epoch 12/1260000/60000 [==============================] - 1003s - loss: 0.9082 - dense_4_loss: 0.2061 - dense_5_loss: 0.2059 - dense_6_loss: 0.1902 - dense_7_loss: 0.1798 - dense_8_loss: 0.1262 - dense_4_acc: 0.9246 - dense_5_acc: 0.9250 - dense_6_acc: 0.9273 - dense_7_acc: 0.9345 - dense_8_acc: 0.9545 - val_loss: 0.2760 - val_dense_4_loss: 0.0670 - val_dense_5_loss: 0.0719 - val_dense_6_loss: 0.0525 - val_dense_7_loss: 0.0404 - val_dense_8_loss: 0.0442 - val_dense_4_acc: 0.9810 - val_dense_5_acc: 0.9812 - val_dense_6_acc: 0.9846 - val_dense_7_acc: 0.9888 - val_dense_8_acc: 0.9901
<keras.callbacks.History at 0x1228eaa90>
1predictions = model.predict(test_images)2
3np.shape(predictions)
(5, 10000, 11)
1len(predictions[0])
10000
1np.shape(test_labels)
(5, 10000, 11)
We'll define a custom to calculate accuracy for predicting individual digits, as well as for predicting complete sequence of images.
1def calculate_acc(predictions,real_labels):2 3 individual_counter = 04 global_sequence_counter = 05 for i in range(0,len(predictions[0])):6 #Reset sequence counter at the start of each image7 sequence_counter = 0 8 9 for j in range(0,5):10 if np.argmax(predictions[j][i]) == np.argmax(real_labels[j][i]):11 individual_counter += 112 sequence_counter +=113 14 if sequence_counter == 5:15 global_sequence_counter += 116 17 ind_accuracy = individual_counter/50000.018 global_accuracy = global_sequence_counter/10000.019 20 return ind_accuracy,global_accuracy21
22ind_acc,glob_acc = calculate_acc(predictions,test_labels)23
24print("The individual accuracy is {} %".format(ind_acc*100))25print("The sequence prediction accuracy is {} %".format(glob_acc*100))
The individual accuracy is 98.514 %The sequence prediction accuracy is 92.86 %
1#Printing some examples of real and predicted labels2for i in random.sample(range(0,10000),5):3 4 actual_labels = []5 predicted_labels = []6 7 for j in range(0,5):8 actual_labels.append(np.argmax(test_labels[j][i]))9 predicted_labels.append(np.argmax(predictions[j][i]))10 11 print("Actual labels: {}".format(actual_labels))12 print("Predicted labels: {}\n".format(predicted_labels))
Actual labels: [0, 9, 7, 10, 10]Predicted labels: [0, 8, 7, 10, 10]
Actual labels: [0, 5, 7, 10, 10]Predicted labels: [0, 5, 7, 10, 10]
Actual labels: [3, 8, 1, 10, 10]Predicted labels: [3, 8, 1, 10, 10]
Actual labels: [6, 8, 1, 10, 10]Predicted labels: [6, 8, 1, 10, 10]
Actual labels: [7, 6, 3, 2, 9]Predicted labels: [7, 6, 3, 2, 9]
We can see that model achieved good accuracy, with around 98.5% accurate for identifying individual digits or blanks, and around 92.8% for identifying whole sequences.