What is Super Resolution?
While writing an article or creating a YouTube video, have you ever encountered a problem where you want to use a certain image, but its resolution is too low and unclear, so you have to use a less suitable but more crisp image?
Machine learning-based super resolution might be the solution to this problem.
In its essence, super resolution in machine learning refers to models that take a low-resolution image as input and produce an upscaled, clear, high-resolution image as the output.
In more technical terms, when we apply a degradation function to a high-resolution (HR) image, we get a low-resolution (LR) image—i.e. LR = degradation(HR). We train a model with the HR image as our target and the LR image as the input.
In today’s time, image super-resolution has many potential uses, ranging from simple tasks such as upscaling an image for the web to more complicated uses cases in security and intelligence gathering that add finer details to images.
The advantage of using machine learning to accomplish this task is that traditional algorithm-based approaches aren’t efficient enough and lack fine details, which in many use cases render them useless.
How does it work?
In technical terms, super resolution is an ill-posed problem because for a single degraded image, there are multiple possible upscaled (HR) images. Simple algorithm-based approaches use the local information in a LR image and compute the corresponding HR image. These have been used for a long time, but the results they produce are inefficient.
In supervised machine learning approaches, the model is trained to learn the mapping functions from LR to HR images on a large dataset. If the degradation function used is known, a large training set of LR and HR images can be created, since LR images can be directly extracted from HR images already available.
If the degradation function is unknown, the collection of a training set is a difficult task because pairs of already-existing HR and LR images are needed. In this case, unsupervised learning may be used to approximate the degradation function.
The mapping function learned by the model is the inverse of the degradation function applied on the HR image.
Model Training and Design
Many different approaches have been applied to train super-resolution models using various model architectures. These approaches differ in terms of which stage in the network the upsampling is done.
Earlier super-resolution models used a pre-upsampling approach, in which the LR images are first upscaled to coarse HR images using traditional algorithms, and then CNNs are used to learn the mappings from these coarse HR images to the desiredHR images.
In the post-upsampling approach, the LR images are passed to the CNNs and the upsampling layer is at the end of the network. The upsampling layer is learnable and trained together with the preceding convolution layers in an end-to-end manner.
The post and pre-upsampling approaches are the most-used ones; however, there are others that have been experimented with, such as progressive upsampling and iterative upsampling, both of which are more efficient but at the same time more complex.
Along with these upsampling approaches, the network design and types of convolutions used are also very important. Super-resolution models require that the information of LR image is preserved in the HR image—therefore, they mainly focus on the residual of the LR and HR images, which is why residual network design with skip connections are used in most of the networks.
Loss functions, in ML terms, basically calculate the difference between the target and predicted values and try to mend the model in a way that reduces this difference.
In super-resolution models, various loss functions are used in combination to obtain the desirable results. These combinations of loss functions enable the model to learn from various aspects of image quality.
Pixel-wise loss is the simplest loss function for this task, in which each pixel in the LR image is compared to the corresponding pixel in the HR image. Along with this, image quality is compared on the basis of perceptual quality using a content loss function. For a generated image to have the same style and texture as the original one, a texture loss function is used, in which texture is defined as the correlation between different feature channels.
In this article, I introduced the concept of super-resolution machine learning models and discussed how they work at a high level. There are several state-of-the art techniques that are more advanced and need active and extensive research, but the core idea of super resolution remains the same for all of them.
Furthermore, active research is needed in fields such as unsupervised super resolution, GAN-based super resolution, and better evaluation metrics to improve the quality of super resolution models.
Read the following for more details about the methods explained here: