Using Generative Deep Learning Models On-Device

Part 1: Images

Over the last few years, generative models have been on the rise thanks to breakthrough innovations, especially in the domain of deep learning. We are now on the verge of solving complex tasks that seemed impossible 10 years ago.

There are countless applications for these techniques, so in this mini-series, we’ll focus on what they could bring to our handheld companions. In the first part let’s take a look at a few applications in image generation and transformation.

What is a Deep Generative Model?

A deep generative model is a neural-network-based learning algorithm that attempts to generate new content, or alter existing content, in a credible manner. The most famous architecture is the Generative Adversarial Networks framework [Goodfellow et al., 2014], which kickstarted the generative model revolution.

The goal of the model is either to generate data (text, audio, image, video etc.) from scratch or to transform the data to conform to a specific target. Once a generative model is fully trained, it can be used to create new data that’s indistinguishable from the training set data.

Enough theory, let’s talk about how this can improve our experience on smartphones!

1. A cure for potato bandwidth

Every year, new phones come out with better cameras, more powerful chips, and bigger screens. The more powerful our smartphones become, the more data they generate, and this presents a tough problem for Internet providers. What used to be occasional small images sent by SMS have now become endless high-definition flows of Instagram posts and HD video livestreams.

There are times when networks aren’t able to keep up with the volume of data, and this terrifying image appears:

Websites are trying to combat this problem by finding creative ways to display a placeholder while your image is loading — that’s called lazy loading. Some of them display color gradients to give you a faint idea of what the image looks like. Others prefer solid color, blurred images, or more creative solutions.

The problem is that encoding every last bit of information in the image makes it heavy. That’s why images are usually compressed, so that most of the weight is shed at the cost of some of the information. Sometimes, even compression doesn’t cut it, and we need to find a better way to display a placeholder while the image loads.

This is where generative models come into play to give us an edge. The idea is to balance the load between the network payload and the smartphone. Instead of transmitting the whole image, we transmit a latent vector representation that the generative model uses to re-create the image. The image will not be 100% like the original, especially in the fine details, but it serves as a better placeholder while we’re waiting for the rest of the image to arrive.

To achieve that, the easiest way is to use a convolutional auto-encoder, but the results are often quite blurry. A better approach would be to use a Conditional GAN architecture, which leads to a more realistic photos. The GAN can be trained to maximize perceptual similarity, so that the most important visual features are preserved at the cost of fine details. The interest in this approach is that the resulting image will still be sharp and only the details (hair fibers, strands of grass, background) will be different. For most of the images, that wouldn’t be a problem at all.

Recent work on state of the art face generation [Karras et al, 2018] hints that we might even be able to find images in the latent space of a generative model without having to train an encoder. This would make this technique available to a more diverse crowd of generative models.

To summarize, your phone will receive two things: the full resolution images (heavy), and their latent space representations (light). While the heavy images are loading, the generative model running on the smartphone would create an approximate placeholder based on the random seed or latent representation of the encoded image.

To put things in perspective, StyleGAN’s latent space is only made of 512 values, which is the equivalent of sending a 13×13 RGB image instead of the full 1024×1024!

2. Post-processing estimation

Another domain that would benefit from strong generative models would be mobile photography. With smartphones, everyone can try to take breathtaking photos and selfies. However, taking a great picture isn’t straightforward and requires a lot of skill and practice. More often than we’d like, we end up with over/under exposed photos. Some of them can be salvaged with post processing, while others can only be thrown away.

A conditional generative model could be used to learn what kind of corrections one can apply in, say, Lightroom or Photoshop, and use that information to generate and display estimates of what the retouched image would be. This would initiate the unskilled photographer to the idea of taking a photograph not for what it is, but for what it can become. You could then take the best picture, knowing the limits of the possible post processing.

A variation on this would be to have a model trained to automatically transform a photo to make it match your Instagram aesthetic. This could encompass color grading, filters, and cropping. Going even further, the model could generate variations of the image (slight adjustments of pose, eye position, or facial expression), and you’d be able to choose the one you prefer.

3. Wallpaper generation

Smartphone users tend to treat lock screens and wallpaper in one of two ways: some use photos, while others prefer the aesthetic and simplicity of abstract or minimalist wallpapers. A generative model can be trained to create such wallpapers for your phone.

They could be static or dynamic and conditioned on the color/style you prefer. Where this gets interesting is that, instead of generating one image, you could ask the model to generate a new wallpaper every day, hour, or every time you unlock your phone.

Each one would be close in aspect to the other ones, but still new. Imagine seeing a new Picasso or Malevich each time you take out your phone — how awesome would that be?

Disclaimer: we’re not there yet, but this should come eventually.

What’s more, you could have the general mood of the wallpaper be conditioned on the weather, the time of the day, how much battery you have left, how close to the weekend you are. etc. I personally think that this one would either be super cool and fun, or get old pretty fast.


Generative models are often quite heavy, and training them is difficult and expensive, but they enable endless possibilities. They might be harder to get to work and to fit into smartphones, but our species is pretty good at solving these kind of issues.

The utility and creative potential of image generation might very well spark a new wave of smartphone apps, much like convolutional neural networks have done for image recognition.

Avatar photo


Our team has been at the forefront of Artificial Intelligence and Machine Learning research for more than 15 years and we're using our collective intelligence to help others learn, understand and grow using these new technologies in ethical and sustainable ways.

Comments 0 Responses

Leave a Reply

Your email address will not be published. Required fields are marked *