Humans have two eyes, which allow us to see two viewpoints of a scene. Using the complementary information from these two viewpoints, we are able to perceive depth in the world around us.
Researchers have been trying to utilize the complementary information present in two viewpoints for a long time — this can be seen in a wide variety of industrial applications, such as robotics and self-driving cars, where multiple cameras are used.
More recently, with the mass-market adoption of dual-camera mobile phones, the applications have shifted from industrial to consumer mobile photography use cases.
A common user-facing application that uses stereo information to enhance mobile phone photography is Portrait Mode, which can be seen on both Apple’s and Google’s flagship devices. By leveraging stereo information, mobile phones are able to calculate the depth of a scene.
As of today, most of the well-known stereo image datasets are still geared towards specific industrial domains, such as autonomous vehicles. The ones that are not geared towards these special domains are usually extremely small in size — hence, not sufficiently allow for generalizability of mobile phone scenarios. As one can imagine, for a (deep learning) algorithm that has only seen roads and vehicles, it would be difficult to generalize to a scene of a kitchen, or a group of people enjoying the beach. Hence, there is an increased need and demand for a large-scale dataset consisting of in-the-wild scenarios to allow for generalization of algorithms to mobile photography scenarios.
This is where Holopix50k fits in — the largest available public dataset of in-the-wild stereo image pairs, almost 50 times larger in size than similar datasets.
In this article, we will look at the usefulness of this dataset, and some practical applications for it!
Holopix50k is a large-scale dataset of 49,368 (~50k) stereoscopic image pairs collected from the popular Lightfield social media app Holopix™. This is the largest dataset of stereoscopic image pairs ever released publicly that contain in-the-wild scenarios captured from mobile phones. For context, the second-largest dataset in this category consists of only 1024 stereoscopic image pairs — almost 50 times less! The dataset is available for download immediately on the project page and also has an associated research paper.
A word-cloud analysis of the dataset using COCO classes shows that the dataset contains good diversity among common objects such as people, vehicles, furniture, etc.
The dataset is extremely diverse in various senses. For example:
- It has both portrait and landscape orientation images
- It has images at 720p (HD) resolution as well as 360p (SD) resolution
- It contains a good mix of indoor and outdoor scenes, as well as synthetic scenes, daytime, night time, etc
- It has unique captures such as fire, parties, snow, rain, toys, selfies, etc.
When compared to other popular datasets, Holopix50k also fares well in various metric scores. The chart below illustrates this:
Applications of the Holopix50k dataset
With such a large dataset now available, the next question one might ask is — what applications is it useful for? In this section, we’ll go over some of the applications of the dataset that were covered in the paper, as well as some additional ones!
Super-resolution is a well known area of research that involves making an image higher resolution than its original resolution. Stereo super-resolution techniques aim to incorporate a second low-resolution image to increase the quality of the super-resolved high-resolution image.
The paper shows that using Holopix50k drastically improves state-of-the-art results on this task using the PASSRNet model, compared to other datasets.
Monocular Depth Estimation
This area of research aims to understand the depth of a scene using a single image. Usually, a stereo image is used for training / supervision of the network.
In the paper, it is shown how using Holopix50k can improve state-of-the-art models’ performance on real-world datasets.
Other interesting tasks include stereoscopic style transfer, stereo flow estimation, and stereo segmentation. All of these tasks could benefit from the high-quality, in-the-wild data provided by Holopix50k, enabling significant model generalization.
The Holopix50k dataset is a breakthrough in stereo vision datasets, especially considering the combination of the diverse nature of scenes present along with the large size of the dataset.
Quoting the paper:
The Holopix50k dataset is currently available for download, and is free for research and non-commercial usage.