Skip to content
Sajal Sharma

Digit Sequence Recognition using Deep Learning

06.03.2017 -
Python, Keras, Deep Learning, CNN, Computer Vision

Note: This project is outdated. There are better models that we can build inline with the current standards. While I will be looking top update this soon, I don't recommend using this code in your current projects related for image recognition with deep learning.


In this project, we will design and implement a deep learning model that learns to recognize sequences of digits. We will train the model using synthetic data generated by concatenating character images from MNIST.

To produce a synthetic sequence of digits for testing, we will limit the to sequences to up to five digits, and use five classifiers on top of your deep network. We will incorporate an additional ‘blank’ character to account for shorter number sequences.

We will use Keras to implement the model. You can read more about Keras at keras.io.

Implementation

Let's start by importing the modules we'll require fot this project.

1#Module Imports
2from __future__ import print_function
3import random
4from os import listdir
5import glob
6
7import numpy as np
8from scipy import misc
9import tensorflow as tf
10import h5py
11
12from keras.datasets import mnist
13from keras.utils import np_utils
14
15import matplotlib.pyplot as plt
16%matplotlib inline
17
18#Setting the random seed so that the results are reproducible.
19random.seed(101)
20
21#Setting variables for MNIST image dimensions
22mnist_image_height = 28
23mnist_image_width = 28
24
25#Import MNIST data from keras
26(X_train, y_train), (X_test, y_test) = mnist.load_data()
27
28#Checking the downloaded data
29print("Shape of training dataset: {}".format(np.shape(X_train)))
30print("Shape of test dataset: {}".format(np.shape(X_test)))
31
32
33plt.figure()
34plt.imshow(X_train[0], cmap='gray')
35
36print("Label for image: {}".format(y_train[0]))
Shape of training dataset: (60000, 28, 28)
Shape of test dataset: (10000, 28, 28)
Label for image: 5

png

Building synthetic data

The MNIST dataset is very popular for beginner Deep Learning projects. So, to add a twist to the tale, we're going to predict images that can contain 1 to 5 digits. We'll have to change the architecture of our deep learning model for this, but before that, we'll need to generate this dataset first.

To generate the synthetic training data, we will first start by randomly picking out up to 5 individual digits out from the MNIST training set. The individual images will be then stacked together, and blanks will be used to make up the number of digits if there were less than 5. By this approach, we could increase the size of our training data. We'll build around 60,000 such examples.

While concatenating images together, we'll also build the labels for each image. First, labels for single digits will be arranged in tuples of 5. Labels 0-9 will be used for digits 0-9, and a 10 will be used to indicate a blank.

The same approach will be used to build the test data, but using the MNIST test set for individual digits, for 10,000 synthetic test images.

Let's write a function that does this.

1def build_synth_data(data,labels,dataset_size):
2
3 #Define synthetic image dimensions
4 synth_img_height = 64
5 synth_img_width = 64
6
7 #Define synthetic data
8 synth_data = np.ndarray(shape=(dataset_size,synth_img_height,synth_img_width),
9 dtype=np.float32)
10
11 #Define synthetic labels
12 synth_labels = []
13
14 #For a loop till the size of the synthetic dataset
15 for i in range(0,dataset_size):
16
17 #Pick a random number of digits to be in the dataset
18 num_digits = random.randint(1,5)
19
20 #Randomly sampling indices to extract digits + labels afterwards
21 s_indices = [random.randint(0,len(data)-1) for p in range(0,num_digits)]
22
23 #stitch images together
24 new_image = np.hstack([X_train[index] for index in s_indices])
25 #stitch the labels together
26 new_label = [y_train[index] for index in s_indices]
27
28
29 #Loop till number of digits - 5, to concatenate blanks images, and blank labels together
30 for j in range(0,5-num_digits):
31 new_image = np.hstack([new_image,np.zeros(shape=(mnist_image_height,
32 mnist_image_width))])
33 new_label.append(10) #Might need to remove this step
34
35 #Resize image
36 new_image = misc.imresize(new_image,(64,64))
37
38 #Assign the image to synth_data
39 synth_data[i,:,:] = new_image
40
41 #Assign the label to synth_data
42 synth_labels.append(tuple(new_label))
43
44
45 #Return the synthetic dataset
46 return synth_data,synth_labels
47
48#Building the training dataset
49X_synth_train,y_synth_train = build_synth_data(X_train,y_train,60000)
50
51#Building the test dataset
52X_synth_test,y_synth_test = build_synth_data(X_test,y_test,10000)
53
54#checking a sample
55plt.figure()
56plt.imshow(X_synth_train[232], cmap='gray')
57
58y_synth_train[232]
(7, 1, 9, 7, 10)

png

Looks like things work as we expect them to. Let's prepare the datset and labels so that keras can handle them.

Preparatory Preprocessing

Preprocessing Labels for model

The labels are going to be encoded to "One Hot" arrays, to make them compatible with Keras. Note that, as the our Deep Learning model will have 5 classifiers, we'll need 5 such One Hot arrays, one for each digit position in the image.

1#Converting labels to One-hot representations of shape (set_size,digits,classes)
2possible_classes = 11
3
4def convert_labels(labels):
5
6 #As per Keras conventions, the multiple labels need to be of the form [array_digit1,...5]
7 #Each digit array will be of shape (60000,11)
8
9 #Code below could be better, but cba for now.
10
11 #Declare output ndarrays
12 dig0_arr = np.ndarray(shape=(len(labels),possible_classes))
13 dig1_arr = np.ndarray(shape=(len(labels),possible_classes))
14 dig2_arr = np.ndarray(shape=(len(labels),possible_classes))
15 dig3_arr = np.ndarray(shape=(len(labels),possible_classes)) #5 for digits, 11 for possible classes
16 dig4_arr = np.ndarray(shape=(len(labels),possible_classes))
17
18 for index,label in enumerate(labels):
19
20 #Using np_utils from keras to OHE the labels in the image
21 dig0_arr[index,:] = np_utils.to_categorical(label[0],possible_classes)
22 dig1_arr[index,:] = np_utils.to_categorical(label[1],possible_classes)
23 dig2_arr[index,:] = np_utils.to_categorical(label[2],possible_classes)
24 dig3_arr[index,:] = np_utils.to_categorical(label[3],possible_classes)
25 dig4_arr[index,:] = np_utils.to_categorical(label[4],possible_classes)
26
27 return [dig0_arr,dig1_arr,dig2_arr,dig3_arr,dig4_arr]
28
29train_labels = convert_labels(y_synth_train)
30test_labels = convert_labels(y_synth_test)
31
32#Checking the shape of the OHE array for the first digit position
33np.shape(train_labels[0])
(60000, 11)
1np_utils.to_categorical(y_synth_train[234][0],11)
array([[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.]])

Preprocessing Images for model

The function below will pre-process the images so that they can be handled by keras.

1def prep_data_keras(img_data):
2
3 #Reshaping data for keras, with tensorflow as backend
4 img_data = img_data.reshape(len(img_data),64,64,1)
5
6 #Converting everything to floats
7 img_data = img_data.astype('float32')
8
9 #Normalizing values between 0 and 1
10 img_data /= 255
11
12 return img_data
13
14train_images = prep_data_keras(X_synth_train)
15test_images = prep_data_keras(X_synth_test)
16
17np.shape(train_images)
(60000, 64, 64, 1)
1np.shape(test_images)
(10000, 64, 64, 1)

Model Building

1#Importing relevant keras modules
2from keras.models import Sequential, Model
3from keras.layers import Dense, Dropout, Activation, Flatten, Input
4from keras.layers import Convolution2D, MaxPooling2D

We're going to use a Convolutional Neural Network for our network.

Starting with a 2D Convolutional layer, we'll use ReLU activations after every Convolutional Layer.

After the second CovLayer + ReLU, we'll add 2DMaxPooling, and a dropout to make the model robust to overfitting. A flattening layer will be added to make the data ready for classification layers, which were in the form of Dense Layers, of the same size as the no. of classes (11 for us), activated using softmax to give us the probability of each class.

1#Building the model
2
3batch_size = 128
4nb_classes = 11
5nb_epoch = 12
6
7#image input dimensions
8img_rows = 64
9img_cols = 64
10img_channels = 1
11
12#number of convulation filters to use
13nb_filters = 32
14# size of pooling area for max pooling
15pool_size = (2, 2)
16# convolution kernel size
17kernel_size = (3, 3)
18
19#defining the input
20inputs = Input(shape=(img_rows,img_cols,img_channels))
21
22#Model taken from keras example. Worked well for a digit, dunno for multiple
23cov = Convolution2D(nb_filters,kernel_size[0],kernel_size[1],border_mode='same')(inputs)
24cov = Activation('relu')(cov)
25cov = Convolution2D(nb_filters,kernel_size[0],kernel_size[1])(cov)
26cov = Activation('relu')(cov)
27cov = MaxPooling2D(pool_size=pool_size)(cov)
28cov = Dropout(0.25)(cov)
29cov_out = Flatten()(cov)
30
31
32#Dense Layers
33cov2 = Dense(128, activation='relu')(cov_out)
34cov2 = Dropout(0.5)(cov2)
35
36
37
38#Prediction layers
39c0 = Dense(nb_classes, activation='softmax')(cov2)
40c1 = Dense(nb_classes, activation='softmax')(cov2)
41c2 = Dense(nb_classes, activation='softmax')(cov2)
42c3 = Dense(nb_classes, activation='softmax')(cov2)
43c4 = Dense(nb_classes, activation='softmax')(cov2)
44
45#Defining the model
46model = Model(input=inputs,output=[c0,c1,c2,c3,c4])
47
48#Compiling the model
49model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
50
51#Fitting the model
52model.fit(train_images,train_labels,batch_size=batch_size,nb_epoch=nb_epoch,verbose=1,
53 validation_data=(test_images, test_labels))
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
60000/60000 [==============================] - 944s - loss: 3.7553 - dense_4_loss: 0.9496 - dense_5_loss: 0.9061 - dense_6_loss: 0.7953 - dense_7_loss: 0.6590 - dense_8_loss: 0.4454 - dense_4_acc: 0.6825 - dense_5_acc: 0.7028 - dense_6_acc: 0.7378 - dense_7_acc: 0.7884 - dense_8_acc: 0.8626 - val_loss: 1.0372 - val_dense_4_loss: 0.2358 - val_dense_5_loss: 0.2332 - val_dense_6_loss: 0.2181 - val_dense_7_loss: 0.1906 - val_dense_8_loss: 0.1596 - val_dense_4_acc: 0.9471 - val_dense_5_acc: 0.9511 - val_dense_6_acc: 0.9551 - val_dense_7_acc: 0.9600 - val_dense_8_acc: 0.9570
Epoch 2/12
60000/60000 [==============================] - 923s - loss: 2.3002 - dense_4_loss: 0.5586 - dense_5_loss: 0.5510 - dense_6_loss: 0.4836 - dense_7_loss: 0.4172 - dense_8_loss: 0.2899 - dense_4_acc: 0.8151 - dense_5_acc: 0.8167 - dense_6_acc: 0.8378 - dense_7_acc: 0.8628 - dense_8_acc: 0.9041 - val_loss: 0.6792 - val_dense_4_loss: 0.1599 - val_dense_5_loss: 0.1576 - val_dense_6_loss: 0.1411 - val_dense_7_loss: 0.1201 - val_dense_8_loss: 0.1005 - val_dense_4_acc: 0.9607 - val_dense_5_acc: 0.9678 - val_dense_6_acc: 0.9688 - val_dense_7_acc: 0.9757 - val_dense_8_acc: 0.9795
...
...
Epoch 11/12
60000/60000 [==============================] - 989s - loss: 0.9621 - dense_4_loss: 0.2213 - dense_5_loss: 0.2163 - dense_6_loss: 0.2024 - dense_7_loss: 0.1902 - dense_8_loss: 0.1319 - dense_4_acc: 0.9194 - dense_5_acc: 0.9226 - dense_6_acc: 0.9249 - dense_7_acc: 0.9315 - dense_8_acc: 0.9522 - val_loss: 0.2832 - val_dense_4_loss: 0.0675 - val_dense_5_loss: 0.0698 - val_dense_6_loss: 0.0516 - val_dense_7_loss: 0.0445 - val_dense_8_loss: 0.0498 - val_dense_4_acc: 0.9801 - val_dense_5_acc: 0.9806 - val_dense_6_acc: 0.9847 - val_dense_7_acc: 0.9871 - val_dense_8_acc: 0.9885
Epoch 12/12
60000/60000 [==============================] - 1003s - loss: 0.9082 - dense_4_loss: 0.2061 - dense_5_loss: 0.2059 - dense_6_loss: 0.1902 - dense_7_loss: 0.1798 - dense_8_loss: 0.1262 - dense_4_acc: 0.9246 - dense_5_acc: 0.9250 - dense_6_acc: 0.9273 - dense_7_acc: 0.9345 - dense_8_acc: 0.9545 - val_loss: 0.2760 - val_dense_4_loss: 0.0670 - val_dense_5_loss: 0.0719 - val_dense_6_loss: 0.0525 - val_dense_7_loss: 0.0404 - val_dense_8_loss: 0.0442 - val_dense_4_acc: 0.9810 - val_dense_5_acc: 0.9812 - val_dense_6_acc: 0.9846 - val_dense_7_acc: 0.9888 - val_dense_8_acc: 0.9901
<keras.callbacks.History at 0x1228eaa90>
1predictions = model.predict(test_images)
2
3np.shape(predictions)
(5, 10000, 11)
1len(predictions[0])
10000
1np.shape(test_labels)
(5, 10000, 11)

We'll define a custom to calculate accuracy for predicting individual digits, as well as for predicting complete sequence of images.

1def calculate_acc(predictions,real_labels):
2
3 individual_counter = 0
4 global_sequence_counter = 0
5 for i in range(0,len(predictions[0])):
6 #Reset sequence counter at the start of each image
7 sequence_counter = 0
8
9 for j in range(0,5):
10 if np.argmax(predictions[j][i]) == np.argmax(real_labels[j][i]):
11 individual_counter += 1
12 sequence_counter +=1
13
14 if sequence_counter == 5:
15 global_sequence_counter += 1
16
17 ind_accuracy = individual_counter/50000.0
18 global_accuracy = global_sequence_counter/10000.0
19
20 return ind_accuracy,global_accuracy
21
22ind_acc,glob_acc = calculate_acc(predictions,test_labels)
23
24print("The individual accuracy is {} %".format(ind_acc*100))
25print("The sequence prediction accuracy is {} %".format(glob_acc*100))
The individual accuracy is 98.514 %
The sequence prediction accuracy is 92.86 %
1#Printing some examples of real and predicted labels
2for i in random.sample(range(0,10000),5):
3
4 actual_labels = []
5 predicted_labels = []
6
7 for j in range(0,5):
8 actual_labels.append(np.argmax(test_labels[j][i]))
9 predicted_labels.append(np.argmax(predictions[j][i]))
10
11 print("Actual labels: {}".format(actual_labels))
12 print("Predicted labels: {}\n".format(predicted_labels))
Actual labels: [0, 9, 7, 10, 10]
Predicted labels: [0, 8, 7, 10, 10]
Actual labels: [0, 5, 7, 10, 10]
Predicted labels: [0, 5, 7, 10, 10]
Actual labels: [3, 8, 1, 10, 10]
Predicted labels: [3, 8, 1, 10, 10]
Actual labels: [6, 8, 1, 10, 10]
Predicted labels: [6, 8, 1, 10, 10]
Actual labels: [7, 6, 3, 2, 9]
Predicted labels: [7, 6, 3, 2, 9]

We can see that model achieved good accuracy, with around 98.5% accurate for identifying individual digits or blanks, and around 92.8% for identifying whole sequences.

© 2022 Sajal Sharma.
Made with ❤️   +  GatsbyJS