Getting Started with Object Detection Using TensorFlow.js

In deep learning, one of the most widely-used technologies is TensorFlow, an end-to-end open-source platform for building models. It has a vast, flexible ecosystem of tools, libraries, and community resources. With the power of TensorFlow, researchers and developers can develop and deploy ML-powered applications.

In this tutorial, we’re going to work with TensorFlow.js, TensorFlow’s JavaScript library. We’ll use this library to learn to perform object detection—and specifically, detect instances of people—using our device’s webcam. The idea is fairly simple. We launch the camera in observation mode. Then, when it detects a human, it starts to record the movement of the person until the person is no longer in the view of the camera.

For our object detection model, we are going to use the COCO-SSD, one of TensorFlow’s pre-built models. More on that next.

What is COCO-SSD?

COCO-SSD is an object detection model powered by the TensorFlow object detection API. SSD is an acronym from Single-Shot MultiBox Detection. This model has the ability to detect 90 Class in the COCO Dataset. It makes use of large scale object detection, segmentation, and a captioning dataset in order to detect the target objects.

Now that we have some context for our project, let’s get started!

Imports and Configurations

First, we need to import the libraries required for this tutorial—React and TensorFlow:

Next, we need to create a functional component with a new variable, as shown in the code snippet below:

Here, the component name is App, which is the main component itself, and the state variable is named records and is initialized using the useState function. There are several other constant variables defined using the useRef function required for manual video configuration.

Now, we need to initialize the plug video source to the COCO-SSD on initial load, as shown in the code snippet below:

Here, we have a prepare function that performs the following operations:

First, disables start and stop buttons.
Second, starts access to webcam.
Third, loads COCO-SSD and assigns it to Model.

Start and Stop Recording

In order to start recording, we need to check the availability of the camera device first. Hence, we need to create a new MediaRecorder instance. Then, we need to create a video object and assign it to the state variable:

In order to stop recording, we need to call a method to stop the MediaRecoder instance that was created when the device started recording. For that, we simply need to call the stop function provided by the instance itself:

Detecting a Human

For the main function that triggers the start and end of the recording, we need to configure the following points:

We need to check shouldRecordRef variable that toggles between the start and stop.
If recording starts, then we need to plug the video source into the COCO-SSD instance.
If the prediction object returns an array, then it means that the model found a human on camera. Hence, we set foundPerson to true.
Then, we need to use foundPerson in order to decide when we start or stop recording.
But as you may notice, this method is still being called only once. So to be able to do this once again, we just need to call requestanimationframe to get the new frame from our window source. Then, we need to call the function recursively to detect the frame and exit.

Point #5 is exactly the reason why we’re using references refs instead of just states.When we’re constantly calling the function recursively, then we keep the old copy of the variables stored in a state, and using refs we can still have up-to-date values constantly. Thus, it enables us to detect the person as long as they’re visible on the webcam screen.

We’ll implement this functionality with the following code snippet:

For the UI , we’re using Bootstrap to create a simple two-column interface with two buttons to start and stop recording. There will also be a table that displays the recorded file. The entire code implementation of the UI part is provided in the code snippet below:

We can use this code in the render method of our functional component.

And that’s it! We’re finished with the implementation. Now we just need to test it to make sure it works properly. In order to run this React project, we can simply run the following command in the project terminal:

yarn run dev

We will get results similar to the ones shown in the simulation below:

Wrapping up.

In this tutorial, we learned how to use COCO-SSD to detect an instance of a human with our webcam. The functions that were used, along with the mix of libraries, are pretty interesting. There is limitless potential to what we can do using this TensorFlow object detection technology.

You could easily extend an application like this using other JS frameworks like Vue, Electron, etc. Sign-language detection, eye-movement detections, virtual web games—these are some of the specific possibilities when it comes to working with TensorFlow.js.

For convenience, the entire code for this tutorial is available on GitHub:

More Resource

How to Train Your Own Object Detector Using TensorFlow Object Detection API — Neptune.ai