Machine Learning Primer for Snapchat Lens Creators

We’ve talked with more than 20 amazingly talented Snapchat Lens Creators over the last few months, and while they’ve been unanimously excited about the potential of creating Lenses with custom machine learning models, there remains an recurring caveat attached to the end of that sentiment—when I can get a grasp of the ML side of things.

Machine learning is tough—especially when you don’t have much in the way of knowledge or previous experience. And the same thing (so far) goes for working with ML in Lens Studio.

We’re seeing a rapid increase in the amount of resources, pre-trained template projects, and even model development tools designed specifically to help make ML more accessible for Lens Creators.

This is the latest in an ongoing series of loosely-connected posts that aim to break ML down into pieces that are at once informative but less intimidating. Here, I want to cover some core pieces of what we call the machine learning lifecycle—with particular focus on those parts of the lifecycle that will most interest Lens Creators.

While we can possible discuss all there is to know about ML, I’ll try here to cover some of the following essentials:

ML Basics (data, model training, etc)
SnapML Basics
Best Practices

Let’s get started!

ML Basics

Artificial intelligence is a wide-ranging discipline that includes algorithms and statistical models capable of performing various tasks without explicit instructions.

Machine learning (ML) models (a subset of AI) are shown training data from which they learn patterns and correlations that help them achieve their goals. These models are the engines inside things like predictive keyboards, intelligent photo organizers, recommendation systems, object trackers — and Snapchat Lenses.

In fact, if you’ve ever used or created a Snapchat Lens, then you’ve interacted with machine learning! Many of the most immersive things that Lens Studio allows you to do out of the box—the clearest example being the impressive face tracking capabilities—are powered by ML models. The Lens Studio team has done an exceptional job of making these ML models implicit in the AR experiences themselves, so you don’t have to be an ML expert or data scientist to work with them.

Even though you don’t need an in-depth knowledge or a PhD to get started, a basic, foundational understanding of ML will be useful if you’re considering working with your own models via SnapML, Lens Studio’s custom ML framework.

Machine Learning Contexts

ML occurs in two primary contexts: model training and model inference. AI models are trained on large datasets of images, text, or other sensor data. Ensuing predictive models are then deployed to target environments or devices (i.e. to Lens Studio or to a smartphone).

Once these models have been trained and deployed, they then make predictions on target tasks (i.e. predicting the contents of an image, or the translation of a text sequence).

To connect these two contexts in which ML occurs, ML engineers and teams work through a model development lifecycle — collecting and labeling data, training models, and testing/deploying those models.

Let’s cover some of the key basics you’ll want to keep in mind.

Working with Data

Data is the foundation for all machine learning projects. It’s the source material upon which models learn to perform their intended tasks. But data is also one of the biggest barriers to entry, especially for ML projects that require unique kinds of data, where open source datasets aren’t readily available (a custom brand logo, say).

Generally speaking, the more training data you show an ML model, the better it will perform on data it encounters in the real world (previously unseen “test” data). Additionally, you’ll need to tell your ML model what to do with the data you show it. This is a process known as data labeling or data annotation.

Here are a few key things to keep in mind when building and working with datasets (note that these apply only to datasets of images for computer vision tasks):

The number of images in your initial training dataset should be in the 1000s, not in the 100s. We recommend a bare minimum of 1000 for models deployed to mobile platforms. Models trained on anything less will struggle to perform their intended task.
Your annotations MUST correspond to the kind of model you’re trying to train. Different ML model types require different kinds of information from training data. It’s essential to ensure that you’re adding the right annotations to your training data — otherwise, models trained on it will fail. Specifically, the following model types require these kinds of annotations:

The more diversity in your dataset, the better. For this point, let’s consider an example. Say you’re building an object detection model for a particular brand logo — you want to identify and track the logo as it appears in and moves throughout the camera scene, and then attach an AR effect to the logo. When building this training dataset, you’ll want to make sure this data will show your model the logo in a wide variety of conditions — with different backgrounds, lighting conditions, positioning, amounts of noise (non-logo image elements), and more. These adjustments to your training data are known as data augmentations, and they help trained models perform better under different environmental conditions, like with dim lighting or blur. There are a number of open source tools that offer programmatic solutions to this process.
Collecting more “ground-truth” data can help improve your models over time. More than likely, the first version of a model you train won’t perform perfectly — that’s ok, and to be expected! Having a system to collect more “ground-truth” data (or data collected from the real world), can help improve datasets, and thus improve models as you iterate on them. The more you can train your model on data that reflects what that model will see in the real world, the better.

Put together, though these processes seem relatively clear, they also can be incredibly expensive and time consuming, even if you do have access to some readily-available data.

We’re particularly excited about the possibilities of generating synthetic data, a process by which you can programmatically create new training data from a limited number of samples. This is what we use to power our dataset generator in Fritz AI:

Model Training

Model training is the process by which training data (as discussed above) is fed to a machine learning model (known as a a model “architecture”) in order to teach that model how to perform a predictive task (i.e. recognize a pizza, detect a brand logo, separate a background from a foreground).

So far, I’ve more or less used the phrases machine learning and AI interchangeably. Technically speaking, we’re working with a particular type of AI/ML model — neural networks.

Neural networks are just one type of machine learning model. Within the breakdown of neural network models, a subset has proven to be particularly powerful. “Deep” models, featuring many stacked layers, perform extremely well on many tasks, which has given rise to the field of “deep learning”.

Although there are many machine learning algorithms that don’t use neural networks at all, their versatility makes neural networks a popular choice for many projects, and especially for platforms like Snapchat that demand incredibly immersive experiences to work on hardware-limited smartphones.

Underneath every machine learning model is a low-level framework supplying the basic mathematical operations used to train a model and make predictions. Though there are more than a dozen frameworks out there, the vast majority of neural network-based projects are written within the TensorFlow or PyTorch ecosystems. If you’re just getting started with machine learning and you’ve done some research on a particular model or feature you hope to build, chances are you’ve come across a repository that uses one of these tools.

You can deploy ML models built with these primary frameworks to cloud-based applications, local servers and devices, or — in the case of Lens Studio — into Snapchat Lenses. But these different platforms/target devices have different requirements that you’ll have to learn or keep in mind. models used in SnapML in Lens Studio must be in either Protobuf (.pb) or ONNX (.onnx) file format — this means converting your base models to these formats and ensuring they’re otherwise compatible with Lens Studio.

As such, when considering training models for use in Lens Studio, there are a few specific things to keep in mind:

Models used with SnapML in Lens Studio must be in either Protobuf (.pb) or ONNX (.onnx) file format. This means converting your base models to these formats and ensuring they’re otherwise compatible with Lens Studio. Lens Studio provides docs on exporting both TensorFlow and PyTorch models to these formats, respectively.
ML Assets (Lens Studio’s name for models) must be under 10MB. This means that model training should include optimization and compression techniques when available (pruning, distillation, etc). It’s beyond the scope of this post to dive into those techniques, and many template Python Notebooks include them already, but it’s helpful to at least be familiar with the concepts.
Not all Python Notebooks are created equal. If you’re new to ML and trying to work with it in Lens Studio, then chances are you’ve encountered Python Notebook templates (iPython or Jupyter Notebooks, more precisely) full of ML code, alongside instructions for things like feeding the model training data. For style transfer models (the easiest model type to get started with), you only need one style image and one test image to run Lens Studio’s template Notebook. But for model types that require a lot more data, you’ll need to collect and annotate that on your own, configure the specific labels as needed (i.e. targeting only certain objects in a larger dataset, like COCO), manage how the data is fed to the model, and more. As such, keep in mind that some model types will be quite a bit easier to work with than others.

SnapML Basics

Once you have a model trained and ready to use in Lens Studio, you’ll need to leverage SnapML, which is the high-level framework for managing custom models. We’ve covered a lot of the basics and intricacies of SnapML in other blog posts (you can find those below in the Additional Resources section), but here I wanted to quickly review the basics of what you’ll need to know as you get started with SnapML in Lens Studio.

What is SnapML?

SnapML is a framework that allows you to integrate and configure custom neural networks inside Lens Studio — as such, it is truly one of the only parts of Lens Studio that lets you extend the specific capabilities of the platform.

Where can I find/build models?

There’s already a growing ecosystem of ML template projects, pre-trained models, and no-code model building tools available to use. We have both a model zoo and a at Fritz AI, but you can also work with templates and Python Notebooks provided by Snap, or other ML tools for creators like Runway ML. As a note, at this time, models themselves cannot be built inside Lens Studio — they are generally either incorporated in template projects or trained elsewhere and uploaded separately.

What can I create with SnapML?

Great question! We understand a lot of what’s possible on the ML side, but what can actually be created on top of the most common ML tasks (classification, object detection, segmentation, style transfer, pose tracking/estimation) is really a question for the Lens Creator community.

However, we’ve put together a few resources that can help you better conceptualize how ML models can give you much more fine-grained information about a given scene.

Best Practices

For those of you who are new to ML, there are a few basic best practices to consider that might look slightly different than (or perhaps similar to!) other technical or creative processes you’re used to. I’d like to highlight a couple of those here.

Develop a clear understanding of what you want the ML part of your project to do — in both practical and technical terms. This is the difference between saying, “I want to detect brand logo X” and saying “I want to locate and track instances of brand logo X as they move through a given camera scene, so that I can attach and anchor AR effect Y to that logo.” Being deliberate about this will make the process of building your model and connecting it to your Lens more defined and approachable.
Search for pre-trained models and already-collected datasets first. If used effectively, these can save a lot of time. This can also teach you the muscle memory of working with AI and layering AR effects on top of predictive models before moving on to your own custom models.
Start with something “simple” and work your way up. When thinking about the seemingly limitless possibilities of working with ML in your Lenses, it can be easy and tempting to quickly jump to a really incredible idea that’s, in practice, quite complicated and difficult to pull off. Additionally, seeing what actually needs to happen in order for complicated projects to succeed can make ML seem entirely too daunting. Start with something that’s more simple and more approachable — maybe the famous “Not Hotdog” app from Silicon Valley or a Style Transfer Lens (which only requires one training image). Whatever you choose, it can help to get a feel for what ML models actually do — yet another argument for starting with pre-trained models, as well.
Don’t be discouraged by inconsistent model results. Machine learning is all about iteration. If you train V1 of a model and it produces some false positives or other wayward predictions, then you’re not alone. This is what ML engineers and dev teams experience daily. You might need to add more labeled images to your training dataset, or maybe the model needed to be trained for longer. These are only two broad examples of ways to experiment, but main takeaway here is that it’s highly unlikely that your models will perform perfectly the first time around—but they can and will improve over time.

Conclusion

Machine learning is hard, and traditionally, it’s something that’s involves a robust development lifecycle and months of investment. We realize that Lens creation often doesn’t exist on the same timeline, with professional project deadlines ranging from 2–6 weeks.

As such, we’re working to make ML easier, more accessible, and something that doesn’t take months of development time that most Creators don’t have. Stay tuned, as we’ll continue to share more tutorials, how-to’s, overviews, spotlights, and more on SnapML and working with machine learning in Lens Studio.

In the meantime, check out this webinar we co-hosted with our friends at Poplar for an overview of how AI can be combined with AR more broadly—and why it can lead to more immersive, engaging experiences.