Using least squares-based classification to detect digits

Classifying digits using the discriminant least squares-based function in less than 100 lines of code.

The least-squares method for classification is based on linearly separating 2 or more classes. In this article, I’m going to show you how to create a Python program to classify images with digits from 0–9 using only NumPy and PIL.

More specifically, the least-squares method (LSM) is a statistical procedure to find the best fit for a set of data points by minimizing the sum of the offsets or residuals of points from the plotted curve. (source)

Here is a Google Drive link that contains all the data you’ll need. You’ll find a folder that contains the train and test images and their corresponding labels. The Train folder contains 240 images for each digit—there are 10 digits (0–9), so 2400 in total and 200 test images. Each image is only 28*28 pixels for simplicity.

The images in the Train folder will be used to train a classifier for each digit using the least-squares method with the corresponding training labels.txt files. The Train folder contains 240 images for each digit arranged—i.e. first 240 images are of 0’s, second 240 images are of 1’s, and so on.

Let’s begin

First, we import the libraries we’re going use. We’ll need only two—the first is NumPy, which we’ll use for all the image/array manipulation that we’re going to do. The second is PIL for the importing/exporting of images.

Now we need to import our images in a NumPy array since we know that we have 2400 images and each is 28*28 pixels. We’re going to flatten the image to be just one big array, so the dimension of each image is going to be (1, 784)— that is, 28*28.

For all images, we’re going to need an array of shape (2400, 784), but LSM requires an extra dimension that’s in form of an extra column of ones.
So the final array of images should be an array of shape (2400, 785)

Least Square Method Formula

Before we continue, I must elaborate on what the variables in the least square method represent. The w-tilda is the weight matrix that we desire from the method, the x-tilda is the input matrix, and t is the labels matrix.

Here’s an example method to get the x-tilda (the input matrix of the training images):

This function simply loops over our whole training folder, gets one image at a time, flattens it, adds a [1] to it, and puts it in the final array (X).

Next, we’re going to need the T, which is the training labels. The images are already ordered, i.e. the first 240 images are zeroes and the second 240 images are ones, and so on.

For example, if we need to get the T for digit 0, we know that the first 240 images contain 0, so the corresponding T will be an array of size 2400, all being -1, except the first 240 indices set to 1.

I’ve created this simple function that creates the labels matrix for any digit we want:

This function takes a number (for example 3) and returns the corresponding T, which is an array of size 2400, all set to -1 except the indices from 480 to 720, which are set to 1.

In the beginning, it creates an array of size (2400) that’s filled with -1. Then start represents the very first image with a 3 in it. Since our folder is ordered, we know that the first 3 images are image number 720. And the digit 3 goes on until image 960. In the loop, I multiply those 240 positions with -1 to become 1.

The following should get you an array that contains 1 in the first 240 indices and -1 in the rest [1,1,1,1,1,………….-1,-1,-1].

We have all our variables—now we only need to calculate the formula. For simplicity, I created a function to calculate A, which is the first part of the formula (((X-tilda-transpose) X-tilda)-inverse)X-tilda.

That is the part of the formula without the T.

Now we’re ready to get our test images

The test function takes as input the path of the test folder. It gets each image, flattens it, and adds a [1] to it. It creates an array of size 10 for each image that’s initially filled with zeroes.

Then it loops 10 times (since there are 10 classes, and with each loop it generates a probability that predicts how likely it is that this image belongs to a given class.

Then it returns the first maximum of this array and this is the final output for this image. resultLabels is the array that contains the predicted class for each of the 200 images.

These are the original labels generated from the test labels text file.

Confusion Matrix

In order to really get a feel of how our classifier is performing, let’s generate a confusion matrix to see more clearly.

This function takes as input the original correct results and the predicted results in order to compare them.

This is the confusion matrix output. Let’s look at an example of how to read it, using row 1 (images that contain 0). There are 19 correctly classified and 1 misclassified (6). In the 3rd row (images that contain 3), there is 1 misclassified as 0, 3 misclassified as 2, 11 correctly classified as 3, and so on.

If you want to visualize the confusion matrix

Conclusion

There are other types of linear/non-linear classifiers that handle the same problem. Although the results of the least-squares method weren’t bad, we could definitely yield better results if we used a larger dataset to train the classifier to do its work.

Avatar photo

Fritz

Our team has been at the forefront of Artificial Intelligence and Machine Learning research for more than 15 years and we're using our collective intelligence to help others learn, understand and grow using these new technologies in ethical and sustainable ways.

Comments 0 Responses

Leave a Reply

Your email address will not be published. Required fields are marked *