Machine learning unlocks a wide range of possibilities when it comes to AR development—nobody has proved that more than Snapchat, whose immersive augmented reality Lenses work so well in large part because of the ML models running underneath them (things like face tracking, background segmentation, and more).
With SnapML, though, Snapchat has opened up this expansive capability to its community of Lens Creators, allowing them to implement custom machine learning models in their Lenses.
But machine learning isn’t always easy to understand conceptually, much less implement in an application or experience. In this blog post I want to try to demystify at least one part of this equation—what’s actually creatively possible with ML in Lens Creation.
I’m by no means an artist, designer, or visually very creative—so my plan here is to try to define and frame core machine learning features in a context that helps you, the true creators, build immersive experiences that take advantage of ML.
Also commonly referred to as image labeling and image recognition, image classification is a computer vision task that simply predicts labels for what is seen within an image or video frame.
Often, that prediction is mapped to a label that can either be shown to end users or trigger a global AR effect. For example, you might have an image labeling experience that understands you’re looking at a beautiful dinner table topped with food.
Classification models can recognize any number of target objects. The Pokémon classification model below can recognize 149 “classes” of Pokémon.
Put another way, the confetti effect above isn’t targeted or localized within the camera scene. If the AR effect you’d like to implement doesn’t need to be connected directly to the component being classified (in this case, the Pokémon themselves), then a classification model should work well.
Possible Use Cases:
- AR treasure hunt/bingo
- Healthy/junk food classifier
If you want to anchor an AR effect to a target object as it moves through a scene, then image classification won’t quite do the trick. Object detection takes things a step further, locating, counting, and tracking instances of each target class and providing 4 “bounding box” coordinates—essentially a rectangle outlining the detected object:
Because object detection allows us to locate each individual instance of targeted objects in images or across video frames, AR effects can become much more connected to those objects. Consider the dinner table example in the previous section— with object detection, you could locate a turkey on a Thanksgiving table, rather than just recognize the table of food as a whole
This kind of model could also be especially effective for Lenses that need to anchor AR effects to brand logos, product packaging, or other unique objects that you want users to engage with directly.
Possible Use Cases:
- Body part tracking (wrists, hands, faces)
- Brand logo engagement
- Custom object tracking (really, your imagination is the limit here)
Segmentation models produce pixel-level masks for green-screen-like effects or photo composites. Segmentation can work on just about anything: a person’s hair, the sky, a park bench — whatever you can dream up. So if you want your AR effects to track closely to an object, or augment entire parts (i.e. segments) of a scene like the floor or walls, then segmentation is the right machine learning task for you.
For instance, in the gif below, you can see that the coloring effect is applied just to the subjects’ hair—this is because the underlying segmentation model predicts pixel-level masks of human hair.
Segmentation, thus, takes us another step further when it comes to scene understanding. Boundaries between objects or scene elements become more clear, and the opportunities to augment and manipulate those boundaries become more fine-grained.
Possible Use Cases:
- Hair style changer (see above)
- Wall paint tester
- Sticker/cut-out maker
- Background replacement
- Sky replacement
- Ground replacement (i.e. hot lava, the childhood classic)
Style transfer is a computer vision task that allows us to take the style of one image and apply its visual patterns to the content of another image/video stream.
Style transfer is one of the easier ML tasks to get started with, as you don’t need a full dataset of images—just one good style image that the model will learn visual patterns from. On their own, style transfer models make for interesting artistic expressions, and with SnapML, it’s never been easier to get those creative Lenses in front of the world.
Possible Use Cases
- Add artistic styles to short films or games
- Reimagine famous works of art (or your own)
While object detection and image segmentation can be used to track target objects as they move through a scene, pose estimation is the ML task that can most effectively locate, understand, and track target objects.
Pose estimation models do this by learning and predicting a target object’s “keypoints”—basically, landmark points that create a skeleton of the object.
While you can estimate the poses of rigid objects (like the mug in the above image), the most common form of pose estimation is centered on tracking human movement.
Lens Studio now has 2D full body tracking (via pose estimation) as an available template in Lens Studio, so this powerful task has become more accessible than ever before.
Possible Use Cases:
- Fitness/Exercise tracking
- Virtual try-on experiences
- Movement-based gaming
While these are some of the core computer vision-based ML tasks that can power SnapML projects in Lens Studio, the use cases presented here only scratch the surface.
For instance, this really interesting Lens (Learn Nepali) from Atit Kharel combines an object detection model with a translation model (in the natural language processing family) to create an experience that allows users to point their camera at a range of objects and instantly see a translation of that object.
This is the kind of imaginative use case of custom ML within Snapchat Lenses that can lead to experiences that are imaginative, immersive, educational, and really fun.
If you have an idea for an ML-powered Lens, but you aren’t sure if its feasible or which model type you’d need to work with, let us know. We’d love to hear what you’re working on and try to help you make it a reality—at least with the ML side of things!
And if you’d like to dive into SnapML in more detail, check out our complete guide, which looks at the framework in its totality—what it can currently do, its opportunities and challenges, and more.