Real-Time Object Detection on Raspberry Pi Using OpenCV DNN

and MobileNet-SSD

Running deep learning models is computationally expensive. And when it comes to image processing with computer vision, the first thing that comes to mind is high-end GPUs—think the 1080ti and now the 2080ti.

But it’s hard to run computer vision models on edge devices like Raspberry Pi, and making a portable solution is difficult with deep learning libraries like TensorFlow or PyTorch.

For this task, it’s almost compulsory to add OpenCV to help pre-process data. And the good news is that OpenCV itself includes a deep neural network module, known as OpenCV DNN. It runs much faster than other libraries, and conveniently, it only needs OpenCV in the environment. As a result, OpenCV DNN can run on a CPU’s computational power with great speed.

The best use case of OpenCV DNN is performing real-time object detection on a Raspberry Pi. This process can run in any environment where OpenCV can be installed and doesn’t depend on the hassle of installing deep learning libraries with GPU support. As such, this tutorial isn’t centered on Raspberry Pi—you can follow this process for any environment with OpenCV.

How Does Object Detection with OpenCV DNN Work?

Previously, I wrote this piece:

While writing the above article, I realized there are lots of code examples available online, but I couldn’t find any output analysis using OpenCV DNN for object detection. So I figured, why not explore the OpenCV DNN module?

So in this tutorial, we’ll be exploring how object detection works with OpenCV DNN and MobileNet-SSD (in terms of inference).

We’ll be using:

  1. Python 3
  2. OpenCV [Latest version]
  3. MobileNet-SSD v2

OpenCV DNN supports models trained from various frameworks like Caffe and TensorFlow. It also supports various networks architectures based on YOLO, MobileNet-SSD, Inception-SSD, Faster-RCNN Inception,Faster-RCNN ResNet, and Mask-RCNN Inception.

Because OpenCV supports multiple platforms (Android, Raspberry Pi) and languages (C++, Python, and Java), we can use this module for development on many different devices.

Why OpenCV DNN?

OpenCV DNN runs faster inference than the TensorFlow object detection API with higher speed and low computational power. We will see the performance comparison in a future blog post.

Why MobileNet-SSD?

MobileNet-SSD can easily be trained with the TensorFlow-Object-Detection-API, Lightweight

Check out the official docs for more:

Getting Started

First things first, here’s a GitHub repo I created that allows you to explore this module:


Installing OpenCV for python

pip3 install opencv-python

Download the pre-trained model from the above link.

We’ll be using MobileNet-SSD v2 for our object detection model, as it’s more popular—let’s download its weights and config.

From the weights folder (after unzipping), we use the frozen_inference_graph.pb file.

Project Structure

(I’m using virtualenv for this tutorial, so there is venv, but this isn’t mandatory.)

Lets code

All of the code can be found in

Loading models (Those we have downloaded)

For the purposes of this tutorial, let’s use this image:

Download the image into the code directory; then read the image with OpenCV and show it:

Feeding the image to the network

To feed image into the network, we have to convert the image to a blob. A blob is a pre-processed image that serves as the input. Pre-processing techniques like resizing according to the model, color swapping, cropping, and color channels mean subtraction (normalizing color channel values by subtracting a mean value).

We’re resizing the image to 300 x 300 as our pre-trained model supports and swapping the color channels from BGR to RGB. OpenCV reads in BGR, while RGB is commonly used in model training. RGB is more popular.

Feeding forward the model

We got the output, so now we need to understand it. For the input image we have given, the shape of the output matrix is (1, 1, 100, 7), Our main concern is output [0,0,:,:]

Thresholding Results

Each detection output gives a predicted confidence in a range of 0 to 1. But most of them are false positive (falsely detected). So we’ll keep only objects of higher confidence. For that, let’s set a threshold of .5. We’ll consider everything up to that threshold when drawing the bounding box around the image.

For the predicted class, we get the id. From the id, we’ll get the class name from the classNames dictionary.

Converting id to class name

This function below takes class_id and returns the class name according to the id from classNames

In this loop, we’re printing Class id, confidence, and class name for debugging purposes before drawing the box.

This is the output we got from the above code:

1.0 0.6377985 person
18.0 0.84042233 dog

The Bounding Box

It’s time to draw the box in the image. To draw the bounding box in the image for the predicted object, we need x, y, width, and height.

But we need to scale the values of the box according to our image height and width. But the image is 3 dimensional, as it also includes color channels, and we’re only taking height and width. So we skip the color channel input with “_”

After scaling

Once we’ve scaled the image, we have to draw the box with those new values in the image

OpenCV’s rectangle function takes arguments for the image input, rectangle x, rectangle y, rectangle width, and rectangle height. It also takes color and thickness.

Adding Text

Now that we’ve drawn the bounding boxes, let’s add the class text in the box:

OpenCV’s putText function takes arguments for image, text, starting x, starting y, font type, font size, and text color. In the above code snippet, We’ve scaled the starting y and font size according to the image dimensions.

Voilà! Now we’ve got our desired bounding box in the detected objects, and we’ve added labels to each of them.


My hope is that this tutorial has provided an understanding of how we can use the OpenCV DNN module for object detection. And with MobileNet-SSD inference, we can use it for any kind of object detection use case or application.

Now that we have an understanding of the output matrix, we can use the output values according to our application’s need. It gets more fun when you run a custom-trained model—maybe we can see this in a future blog post!

Discuss this post on Hacker News and Reddit.

Find me:



Avatar photo


Our team has been at the forefront of Artificial Intelligence and Machine Learning research for more than 15 years and we're using our collective intelligence to help others learn, understand and grow using these new technologies in ethical and sustainable ways.

Comments 0 Responses

Leave a Reply

Your email address will not be published. Required fields are marked *

wix banner square