In this piece, we’ll build a deep learning model to classify objects in an image. To build the convolutional neural network, we’ll use this dataset available at Kaggle. A CNN is a type of neural network primarily used in visual tasks. The network will detect the features of an animal, and then use those features to classify a given input image as either a cat or a dog.
Importing Necessary Packages
Let’s start by activating our virtual environment:
conda activate my_env
Run the following command to install keras and tensorflow:
conda install tensorflow keras pillow
Here, we’ve also installed pillow to facilitate the loading of images later.
Now import the following packages:
- Sequential for initializing the artificial neural network
- Convolution2D for implementing the convolution network that works with images
- MaxPooling2D for adding the pooling layers
- Flatten for converting pooled feature maps into one column that will be fed to the fully connected layer
- Dense that will add a fully connected layer to the neural network
Initializing the Neural Network
Next, we’ll use the Sequential package to initialize a linear stack of layers. For a classification problem like this one, we usually create a classifier variable.
We now have an instance of the neural network, but it doesn’t do anything by itself. We’ll need to apply a function to the dataset, which means the network needs a convolution layer. We’ll add this layer in the next step.
Adding a Convolution Layer
The layer is added by calling the add function on the classifier and passing the required parameters. Passing in the parameters is done using Convolution2D. The first parameter (filters) is the number of output filters in the convolution. They’re also referred to as feature detectors.
The second and third parameters represent the height and width of the 2D convolution window. input_shape is the shape of the input image. Black and white images are converted to 2D arrays, while colored images are converted to 3D arrays.
A convolution is a mathematical computation involving two functions aimed at finding out how the two functions affect each other.
This process involves three key items; the input image, the feature detector, and the feature map. A feature map is obtained by multiplying the matrix representation of the input image element-wise with the feature detector. This process is aimed at reducing the size of the image, and it retains the features that are important for classifying input images and discards the features that are not. Each feature map detects the location of the image’s unique features.
In this case, we’re working with colored images. Therefore, we’ll pass three channels to the input_shape parameter. We’ll also need to pass in the dimensions of the 2D array for each channel. The final parameter we’ll pass is the activation function. Since image classification is a non-linear task, we’ll use the rectifier function. This makes sure that we don’t get negative values during the operation.
We now have a CNN that will detect features in the images dataset. In the next step, we’ll reduce the size of these features using pooling. This will help in reducing the computation time of the deep learning model.
Pooling to Reduce the Size of the Feature Map
We’ll now add a pooling layer to the network in order to reduce the size of the feature maps. We use a 2×2 pool size for max pooling. This reduces the size of the images while retaining important information.
The position of an object in an image doesn’t impact the capacity of the neural network to perceive its unique features. Since images are very different in terms of lighting and the angle used to take the picture, pooling ensures that the neural network is able to detect the unique features, those differences notwithstanding.
Max pooling places a 2×2 matrix on the feature map and picks the largest value in that box. The 2×2 matrix moves across the entire feature map and picks the largest value in each move. The obtained values form a matrix known as a pooled feature map.
Max pooling is important because it preserves the image’s unique features while reducing the size of the image. This process also reduces overfitting because the CNN only receives the features that are important for the classification task.
In the next step, we’ll transform these feature maps into a format that can be accepted by the deep learning model.
Flattening the Feature Maps
Time to flatten all the feature maps into a single vector. This vector is then passed to the CNN for processing. This is achieved by calling the Flatten() function on the classifier.
At this point, the features are in a structure that can be fed to the neural network. However, we have to add a layer that will give us the output after feeding the features into the neural network. This will be the subject of the next step.
Adding Layers to the Neural Network
We’ll use the vector we obtained above as the input for the neural network by using the Dense function in Keras.
The first parameter it takes is units, which is the number of nodes in the hidden layer. The optimum number of units can be determined via experimentation. The second parameter is the activation function. The ReLu activation function is usually used in this layer.
The flattened feature map is passed to the CNN. The input layer, the fully connected layer, and the output layer are involved in this process. The fully connected layer is the same as the hidden layer in artificial neural networks, only that now it’s fully connected. The predicted image classes are obtained from the output layer.
The network computes the predictions as well as the errors in the prediction process. The network improves its predictions via backpropagation of the errors. The final result is a number between zero and one. This number represents the probability of each class.
We’re now ready to add the output layer. In this layer, we’ll use the sigmoid activation function since we expect a binary outcome. If we expected more than two possible outcomes, we’d have used the softmax function.
The units here is 1 since we expect just the predicted probabilities of the classes.
We now have all the layers for the deep learning model in place. However, before we can start training the model, we have to ensure that we’re reducing the errors that occur during the training process. This maximizes the chances of getting good results from the model. Therefore, in the next step, we’ll implement a strategy that will reduce errors during training.
Compiling the CNN
Compiling the CNN is done using the compile function. The function expects three arguments:
- the optimizer,
- the loss function
- the performance metrics
We’ll apply gradient descent as an optimizer for the model. In this case, the binary_crossentropy loss function is most appropriate since this is a binary classification problem.
Gradient descent is an optimization strategy that works to reduce errors during the training process in order to get to the point where the error is least. This is achieved by finding the point where the cost function is at its minimum. This is known as the local minimum and is found by differentiating the slope at a specific point and descending into the minimum of the cost function. Here, we’ll use the popular Adam optimizer.
We’re now sure that errors will be handled properly during training, we’re ready to fit the classifier to the training images.
Fitting the CNN
Before we can fit the CNN, we’ll pre-process the images using Keras in order to reduce overfitting. This process is known as image augmentation. We’ll use the ImageDataGenerator function for this purpose.
from keras.preprocessing.image import ImageDataGenerator
The function will rescale, zoom, shear, and flip the images. The rescale parameter rescales the images pixel values between zero and one. horizontal_flip=True flips the images horizontally.
After that, we’ll also need to rescale the test data images using ImageDataGenerator.
Next, we’ll create a training set using train_datagen. Use flow_from_directory to obtain the images from your current working directory. Pass in the path as the first parameter and target_size as the second parameter.
The target_size is the size of the image. We’ll use 256×256 since we’ve already specified it above. batch_size is the number of images that must have passed through the network so that the weights are updated. We specify class_mode as binary since this is indeed a binary classification problem.
Run the following command to load in the training images. Since our Notebook and the training_set are in the same folder, the images will be loaded without any errors.
Now we’ll create a test set with similar parameters as above. Again, since our Jupyter Notebook and the test_set are in the same folder, the test set images will be loaded without any errors.
Fitting the classifier to the training set is done by calling the
fit_generator function on the classifier object. We pass in the training set as the first argument. steps_per_epoch is the number of steps obtained from the generator before one epoch is finished. epochs is the number of iterations used to train the CNN on. validation_steps is the total number of steps to validate before stopping.
To recap, in this step, we loaded in the training and test images, pre-processed them, and fitted the training set to the model we created. It’s now time to test the model on an unseen test image.
Making a Prediction
First, we’ll have to pre-process the images. This can be achieved with the help of numpy and image. image will be used to load the new images, while numpy will be used to convert them into numpy arrays.
We can now load in the image that we’d like to predict. This is done using the load_img function from the image module. Pass in the location of the image as the first argument and size of the image as the second argument. Use the same image size as the one used during model training.
Since we’re using colored images, we have to transform the test image into a 3D array. We can realize this using the img_to_array function from the image module.
At this point, we should have the image in three dimensions. However, before we can make the prediction, we pass in a fourth parameter. This parameter corresponds to the batch size.
The image is now in three dimensions. As you probably noticed earlier, we passed in the images in batches. In this case, we’ll have one batch of one input image. The expand_dims method from NumPy will enable us to add this fourth dimension.
The first argument we pass to it is the test image, and the second is the position of the dimension that we want to add. Add it to the first position, since that’s where the neural network expects it to be. The first position corresponds to axis 0:
Now use the predict method to predict which class the image belongs to. This will give you a number between zero and one. This represents the probability of the image being of class 1.
The class_indices attribute from the training set will help us in getting the class labels.
The output we get will look like this: