Wednesday, 22 May 2019

Digit Recognizer

This is an application that allows a user to draw a digit onto a canvas and the program will try to predict what the user had drawn. It uses neural networks to classify and is trained on the MNIST data set which contains handwritten digits ranging from 0-9. As of such, this program is only able to classify digits from 0-9.

The link to the github repo is here


Neural network models

Different neural network architectures were used to compare their performance. The list below describes the different models used and the nodes in each layer.
  1. Input 784 - Hidden 50 - Output 10 using only numpy
  2. Input 784 - Hidden 800 - Output 10 using tensorflow
  3. Input 784 - Hidden 800 - Hidden 800 - Output 10 using tensorflow
  4. Input 784 - Conv 32 5x5 filters with max pooling - Conv 64 5x5 filters with max pooling - Fully connected 1024 - Output 10

Model number 1 has an accuracy on the test data of 90%. Model number 2 had an improved accuracy of 95% and Model number 3 only had small improvements of 96%. Finally model number 4 performed the best with 98% accuracy.

Model number 4 was subsequently chosen as the classifier for the digit recognizer application.

Purpose of the project

I made this application as an attempt to understand neural networks better. This prompted me to make a neural network using only matrix multiplications in numpy which allowed me to more deeply understand the calculations required to perform classification with these models.

After I had made a simple neural network, I noticed I was restricted to a small architecture of neural networks, that is having few layers and few nodes in those layers. This was because everything was being calculated on the CPU.

As a result, I decided to implement neural networks using Tensorflow which utilized my GPU to perform calculations in parallel, thus significantly improving speed. This then allowed me to explore deeper structures and also computationally expensive architectures in particular the convolutional neural network (CNN).


Image result for gtx 1060
This is the GPU that I currently have.

Modules used

Python 3
tKinter -  for the GUI and canvas drawing application
Pillow - for saving the image and performing preprocessing such as filtering
numpy - for the arrays
tensorflow - fast calculation
pickle - to save weights after training
matplotlib - to visualize the dataset

Structure of the program

System Diagram

The canvas app talks to the neural network model to perform classification. By separating the canvas from the model, different models are able to be switched in easily. The neural network trainer then trains the specific neural network model by providing data in batches and performing optimization. After the neural network has been trained, the weights can be saved into a database. 

Things to be improved

The application struggles with some digits such as the digit number 9. Also if the user does not draw the digit "nicely" in the box provided, the results can widely vary.

To improve this, perhaps better data preprocessing is required such as stretching the digit to fill the canvas. However, I believe the biggest improvements would be to use a larger database of digits containing skewed and digits of different sizes to account for all the different possible variations.