How LiDAR Object Detection Works

LiDAR is a sensor currently changing the world. It is integrated in self-driving vehicles, autonomous drones, robots, satellites, rockets, and many more.

This sensor understands the world in 3D using laser beams and measuring the time it takes for the laser to come back. The output is a point cloud.

It’s a technology harder to understand than the camera because less people have access to it. I had access to it and today will help you understand how LiDAR detection works for self-driving cars.

LiDAR’s underlying technology could power a wide array of use cases, but currently, self-driving vehicles offer us the best method of exploring this technology.

Before anything, I invite you to join the Think Autonomous Mailing List and learn every day about self-driving cars, Computer Vision, and Artificial Intelligence.

LiDAR — A 3D Light Sensor

A LiDAR sensor is composed of two parts: laser emission (top) and laser reception (bottom). The emission system works by leveraging layers of laser beams. The more layers, the more accurate the LiDAR will be. On the image above, you can see that the more layers, the bigger the sensor.

Lasers are sent to obstacles and reflect

When these lasers hit an obstacle, they create a set of point clouds. The sensor works with Time of Flight (TOF). Essentially, it measures the time it takes for every laser beam to reflect and come back.

When at very high quality (and price), LiDARs can create rich 3D scenes of an environment. They can emit up to 2 million points every second.

Point clouds represent the world in 3D. The LiDAR sensor gets the exact (X,Y,Z) position of every impact point.

A LiDAR sensor can be solid-state or rotating

In the first case, it will focus its detection on a position and offer a coverage range (90° for example). In the latter case, it will rotate around itself and offer a 360° view. In this case, we want to place it on the roof for better visibility.

LiDARs are rarely used as standalone sensors. They’re often coupled with a camera or a RADAR, in a process called Sensor Fusion. Please find my Sensor Fusion article here to learn more.

The fusion process can be of two types: Early Fusion or Late Fusion. In the first case, the point cloud is fused with the image pixels. In the second case, the individual detections are fused together.

What are the disadvantages of LiDAR sensors?

  • LiDARs cannot directly estimate velocities. They need to compute the difference between two consecutive measurements to do so. RADARs, on the other hand, can estimate the velocity thanks to the Doppler effect.
  • LiDARs don’t work well in bad weather conditions. In cases of fog, the lasers can hit it and muddle the scene. Similar with rain drops or dirt.
  • LiDARs are cumbersome in terms of size—they can’t be hidden like a camera or a RADAR.
  • The price of LiDAR, even though dropping, is still very high.

What are the advantages of LiDARs?

  • LiDARs can accurately estimate the position of obstacles. So far, we don’t have more accurate means to do this.
  • LiDARs work with point clouds. If we see point clouds in front of a vehicle, we can stop even if the obstacle detection system didn’t detect anything. It is a huge security for any customer to know that the vehicle won’t only rely on neural networks and probabilities.

In this article, I will cover the obstacle detection process (late fusion) that generally happens in 4 steps:

  1. Point cloud processing
  2. Point cloud Segmentation
  3. Obstacle clustering
  4. Bounding box fitting.

Point Cloud Processing — Voxel Grid

To process point clouds, we can use the most popular library, called PCL (Point Cloud Library). It’s available in Python, but it makes more sense to use it in C++, as the language is more suited to robotics. It’s also compliant with ROS (Robotic OS).

The PCL library can do most of the computations necessary to detect obstacles, from loading the points to executing algorithms. This library is the computer vision equivalent to OpenCV.

Since the output of LiDAR can easily be 100,000 points per second, we need to use something called a voxel grid to downsample the results.

What is a Voxel Grid?

A voxel grid is a 3D cube that will filter the point cloud by only leaving one point per cube.

The bigger the cube, the lower the final resolution of the point cloud.

In the end, we can downsample our cloud from 100,000 points to only a few thousand.

ROI (Region of Interest)

The second operation we can perform is ROI (region of interest).We’ll simply remove every point that isn’t part of a specific region—for example 10 meters on the side, and 100 meters ahead.

3D Segmentation — RANSAC

RANSAC

A very popular method used for segmentation is called RANSAC (RANdom Sample Consensus). The goal of the algorithm is to identify outliers in a set of points.

The output of the point cloud is generally representing some shapes. Some shapes represent obstacles, and some simply represent the reflection on the ground. RANSAC’s goal is to identify these points and separate them from the others by fitting a plane or a line.

To fit a line, we could think of a linear regression. But with so many outliers, the regression would try to average the results and miss the line. As opposed to a linear regression, here the algorithm will identify these outliers and won’t fit them.

We can consider this line to be the scene’s target path (i.e. a road), and the outliers to be the obstacles.

How does it work?

The process is as follows:

  1. Pick 2 points at random
  2. Fit a linear model to these points
  3. Calculate the distance from every other point to the fitted line. If the distance is within a defined distance tolerance, we add the point to the list of inliers.

In the end, the iteration with the most inliers is selected as the model; the rest are outliers. This way, we can consider every inlier to be part of the road, and every outlier to be part of an obstacle.

RANSAC also works in 3D. In this case, a plane between 3 points is at the base of the algorithm. The distance from a point to the plane is then calculated.

Here is the result of a RANSAC algorithm on the point cloud. The purple region is representing the vehicle.

Clustering — Euclidean & KD Tree

Clustering is a technique where we separate group of points by their distances. Consider the previous image above, where we have a few visible obstacles— we need the algorithm to understand by itself that there are in fact multiple cars and set one color per obstacle.

How does it work?

Clustering is a family of machine learning algorithms, including: k-means (the most popular), DBScan, HDBScan, and more

We can simply go with Euclidean clustering and calculate the Euclidean distance between points.

The process is as follows:

  • Pick 2 points, a target and a current point
  • If the distance between the target and the current point is within a distance tolerance, add the current point to the cluster.
  • If not, pick another current point and repeat.

Computations & KD Tree

The problem with the above technique is that a LiDAR sensor can output 100,000 point clouds. It would mean 100,000 Euclidean distance calculations. To avoid calculating distances for every single point, we can use a KD tree.

Take this scenario above, where the orange point at the bottom isn’t below the distance tolerance threshold. We can remove every point on the right side of this orange point, because we’re sure they won’t be within the tolerance threshold either. Then we can take another point, calculate the distance, and repeat.

Bounding Boxes

The final objective is to create a 3D bounding box around each cluster.
This part is not particularly difficult, but it makes assumptions about the obstacles’s sizes. Since we did not make any classification, we must fit the bounding boxes to the points.

One algorithm that can help fit bounding boxes is principal component analysis (PCA).

Using PCA, we can draw a bounding box that corresponds exactly to the point clouds. This will help with parked cars, for example, where the detection is only partial.

Results

Here’s the result of my project from the Sensor Fusion Nanodegree Program. You can access the code here.

LiDAR is a very powerful and reliable sensor that’s used a lot in robotics. Today, we can go even further and fit point clouds to neural networks working in 3D that output the bounding boxes directly.

For a while, LiDAR technology has been criticised for its cumbersome size and its price, making it an elite sensor.

Recently, Apple announced a LiDAR sensor on its new iPad Pro, which significantly reduces the price barrier to under $1,000.

With the price dropping, it might become accessible even to independent developers.

With LiDAR availability, obtaining the skills to work with this sensor will become a real plus for an engineer! Sensor fusion is also a fascinating topic that only makes sense if you master LiDAR + camera or LiDAR + RADAR detection.

Avatar photo

Fritz

Our team has been at the forefront of Artificial Intelligence and Machine Learning research for more than 15 years and we're using our collective intelligence to help others learn, understand and grow using these new technologies in ethical and sustainable ways.

Comments 0 Responses

Leave a Reply

Your email address will not be published. Required fields are marked *

wix banner square