Image and video compression techniques have greatly advanced in recent years. However, most compression techniques still can’t handle the massive growth in media data.
Read on to learn how deep learning (DL) can help solve the challenges of traditional compression frameworks.
Table of contents:
Brief Overview of Image and Video Compression
Compression enables the delivery of high-quality images and video under the limitations of transmission networks and storage. Essentially, compression removes redundant information from media files.
For example, spatial redundancies are duplicated pixels in a still image. To reduce spatial redundancy, compression techniques use a smaller number of bits to deliver similar pixels.
Early image compression methods like like Huffman reduced statistical redundancy with entropy coding techniques. Later, spatial frequency encoding methods like Discrete Cosine Transform (DCT) reduced image energy in the low-frequency domain.
Compression standards like JPEG leverage the DCT domain to identify areas in an image that humans consider to be the same. The human eye cannot see the difference between all possible colors or perceive high-intensity changes.
Intense changes and small, sharp details in an image become high frequencies in DCT. Since the human eye cannot see these frequencies anyway, the compression process ignores them. As a result, you can represent the same image with a 16-bit color space instead of 24-bit.
Video compression methods are based on the idea that successive frames are highly correlated. Many compression standards like MPEG and H.264 reduce the temporal redundancy between frames to reduce the size of a video file.
Recently, however, convolutional neural networks (CNNs) have achieved significant success in many fields, including image and video compression. You can use a CNN as a feature extractor mechanism that can benefit image and video compression.
Image Compression with Deep Learning
DL-based image compression originated in the late 1980s with techniques like multi-layer perceptrons (MLPs), random neural networks, and convolutional neural networks (CNN). The following list of image compression methods is organized according to the chronological order of neural network technique development.
Multi-Layer Perceptron (MLP)-based compression
MLP consists of an input layer of neurons, several hidden layers of neurons, and a final layer of output neurons. Theoretical research has shown that MLP constructed with more than one hidden layer can be useful in scenarios such as data compression and dimension reduction. The idea behind MLP image compression is a unitary transformation for the whole spatial data.
The first MLP image compression algorithm was released in 1988. The algorithm took into account traditional image compression steps like spatial domain transformation, quantization, and binary coding as an integrated optimization problem.
Then, the algorithm utilized a decomposition neural network to find the optimal binary code combination in the output of the compressed bitstream. However, this strategy cannot fix the neural network parameters to the variable compression ratio.
In the following years, the algorithm was further improved using predictive techniques, which approximate the value of a pixel based on the values of its neighboring pixels. Then the MLP model minimizes the mean square error between the original and predicted pixels by utilizing backpropagation.
Convolutional neural network (CNN)-based compression
CNNs outperform traditional computer vision algorithms with improved super-resolution performance and compression artifact reduction. CNNs leverage the convolution operation to characterize the correlation between neighboring pixels. Cascaded convolution operations are similar to the statistical properties of natural images.
However, CNN models are difficult to incorporate in end-to-end image compression. CNN training depends on backpropagation and gradient descent algorithms.
The problem is that the quantization module in image compression produces zero gradients between loss functions, convolution weights, and biases. As a result, the CNN stops updating its parameters.
A first attempt to implement image compression with CNNs came in 2016. The algorithm consists of two modules: analysis and synthesis transforms for the encoder and decoder. Analysis transformation includes three stages — convolution, subsampling, and divisive normalization. Each stage starts with an affine convolution:
vi(k) (m.n) = (hk,ij uj(k))(m,n) + ck,i
- uj(k) is the input channel of the K-stage at spatial location (m,n)
- hk,ij is the convolution parameter
- ck,i is the convolution neural network bias
The downsampled output of the convolution is:
i(k) (m.n) = vi(k)(skm,skn)
- where sk is the the downsampling factor.
Finally, the downsampled signals is calculated by a Generalized Divisive Normalization (GDN) transformation.
CNN-based compression improves the Peak Signal-to-Noise Ratio (PSNR) and the Structural Similarity (SSIM) metrics of JPEG2000. The algorithm was further improved through the application of the hyper priors scale for entropy estimation. As a result, image compression achieved similar performance as the High-Efficiency Video Coding (HEVC) standard.
Generative Adversarial Network (GAN)-based Compression
GANs are deep neural networks that consist of two generative network models that essentially oppose one another. An image compression algorithm based on GANs was introduced in 2017.
The 2017 algorithm produces compressed files 2.5 times smaller than JPEG and JPEG200, 2 times smaller than WebP, and 1.7 times smaller than BPG. In addition, the algorithm can run in real-time by leveraging parallel computation GPU cores.
The GAN compression technique compresses the input image into a very small feature space. Then the generative network reconstructs the compressed image from the features.
The most obvious difference between GAN-based and CNN-based image compression is the adversarial loss that enhances the quality of the reconstructed image. The generative network and adversarial network are trained simultaneously to significantly enhance the performance of the generative model.
Compression with deep neural networks still presents many challenges in computational complexity and memory consumption. However, deep neural networks have proven to be very efficient in the understanding and representation of images and videos.
The parallel-friendly attribute makes deep neural networks suitable for intensive computational tasks on GPUs and TPUs. Network-based end-to-end optimization approaches are more flexible than other methods. Deep neural network-based techniques can be quickly optimized or tuned. Neural networks have great potential to further improve image and video compression problems like PSNR.