Computer Vision — SnapML
The style transfer is one of the most creative applications of Convolutional Neural Networks. It allows you to retrieve the style of an image and use it to transform any given image.
It is an interesting technique that highlights the capacities and internal representations of neural networks. It can also be useful in certain scientific fields for augmenting or simulating image data.
The almost endless combinations of content and styles possible bring out unique and ever more creative results from neural network enthusiasts.
To work, it is necessary to choose a reference image for the content and a reference image for the style (such as the artwork of your favorite painter). A third image, initialized with the content image, is then optimized bit by bit until it looks like the content image, but “painted” in the style of the second image.
The principle of style transfer consists mainly of defining two distance functions:
- The first one describes the difference between the content of two images.
- The second one describes the difference between two images in relation to their styles.
It is then sufficient to minimize these two distances, using a backpropagation technique to obtain, after optimization, an image that matches the content of the content image and the style of the style image.
The distances are described below, and are calculated from the images extracted from intermediate layers of the neural network.
One impressive aspect of this technique is that no new neural network training is required — using pre-trained networks like VGG19 is sufficient and works well.
But you don’t have to worry about all this, because Fritz Studio will take care of all the technical parts!
Finding a Good Style Image
This is probably the trickiest part because you have to pick a style image that will match your expectations on how the model will try to extract the style. In my experience, when picking style images, try to choose style images with lots of details, or a well-defined style with obvious patterns and colors. Using a very simple image will give you bad or confusing results.
There are plenty of examples on the internet, so do some research on the outcome of different style images and find the one that matches the look that you are going for.
Understanding the Training Parameters
When starting a new training job, Fritz Studio gives you a set of parameters to fine-tune in order to achieve your desired style. The parameters are as follow:
- Style: Scaled from 1 to 10, this parameter allows you to define the level of prominence the style image will have on the styled image. To put it simply, the higher the value, the more the output image will resemble the style image.
- Content: Contrary to style, content will allow you to define the scale of which the input image (styled image) will keep its content. Also scaled from 1 to 10.
- Variation: This parameter will let you smooth out the styled images. Let’s say you have a style image with a number of colors and one or two dominant ones. By increasing the value of variation, you will end up washing out less prominent colors which means less common colors will decrease. This also applies to shapes. Variation is scaled from 0 to 10.
- Stability: This parameter is specifically intended for video mode in Snapchat. When you are making a video using any ML model, you are inferring on each and every frame. By increasing the value of stability you are smoothing out the transition between each frame transformation. In other words, the model will try to stabilize videos.
Train a Style Transfer Model with Fritz Studio SnapML
Training a style transfer model is probably the easiest deep learning model to train because you don’t have to collect and annotate a huge amount of data. You will need a style image used in the training and that’s it.
Create a new project
Create a new project, name the project, and choose “Style Transfer” as a template project.
Start a new training job
Start by giving a name to your style, then upload the style image and specify the training parameters. By default, the parameters are set as follow:
- Style: 1
- Content: 1
- Variation: 5
- Video stability: 5
Then, finally, set the training budget. By default Fritz recommends 2 hours, but you can set any time budget you want. In all the experiments I did, models converged at around 2 hours and 20 minutes. Conveniently enough, you will receive an email when the process is finished.
Experiment results
In order to illustrate the importance of the fine-tuning parameters, I did a set of training jobs to showcase how each and every parameter will impact the final model.
Model 1
This model was trained using the default settings:
- Style: 1
- Content: 1
- Variation: 5
- Video stability: 5
I think the default settings will have a different result depending on the style image you are using to train the model. Since both the style and content parameters are minimized, the output will be dependent on the content of both images.
Model 2
- Style: 10
- Content: 1
- Variation: 5
- Stability: 5
By increasing the style parameter to the maximum, the model will try to emphasize the colors and shapes of the style image rather than trying to change the output image in a subtle way. This approach is a bit too aggressive in my opinion, since it almost erases the input image content.
Model 3
- Style: 1
- Content: 10
- Variation: 5
- Stability: 5
This is the opposite of the Model 2 above because we max out the content parameter and minimize the style parameter. I think with a great style image, you can achieve a great filter look rather than a radical style transfer transformation of the input image.
Model 4
- Style: 5
- Content: 5
- Variation: 5
- Stability: 5
Not my favorite of them all, but we can see some building shapes from the input image. To my taste, a great style transfer model will try to keep some important elements of the input image and add some crazy textures, shapes, and colors in order to achieve a beautiful output.
Model 5
- Style: 5
- Content: 10
- Variation: 5
- Stability: 5
By keeping the maximum content of the input image and moderately increasing the style parameter, I think in most cases it’s the best balance between having a beautifully transformed input image without totally changing the core content. The result will change depending on your input and style image, but in my experience, I find it to be the best-balanced result.
Conclusion
Mastering style transfer models is not easy, but having the ability to train a model without touching any code, that’s even harder. Fritz Studio gives you the ability to train and iterate using different parameters in a way that is just painless.
There are some limitations that come with models that should run under Snapchat requirements, such as models that should run smoothly around 30 frames per second and be compatible with a vast number of devices. These restrictions limit the size of the models that could be used and impact the spectrum of possibilities.
Nonetheless, having the ability to create a crazy Snapchat Lens using the power of machine learning with no code involved is just fantastic!
Thank you for reading this article. If you have any questions, don’t hesitate to send me an email at [email protected].
Comments 0 Responses