Clone is the latest SnapML-powered Lens built by the team at Fritz AI. As the name suggests, the Lens lets users create digital clones of objects in the real world. If you haven’t already, try it out for yourself here:
In this post, I want to provide a quick behind-the-scenes look at how the Lens works and how we leveraged state-of-the-art AI models with Snap Lens Studio and SnapML to create the cloning effect.
The first step of any AI / ML problem is to define the task. In this case, we wanted to look at an image or a section of an image, and separate out the foreground object from the background. This task is generally known as saliency detection and is closely related to image segmentation tasks (with two classes, background and foreground).
Using our own expertise designing small, efficient neural networks, along with some inspiration from the impressive U²-Net model, we created a saliency model that produced high-quality segmentations that also fits well under Lens Studio’s 10mb asset limit.
With a model in hand, we needed to build the rest of the cloning experience around it in Lens Studio. We started with a list of user experience requirements. The Lens user would be able to:
- Manipulate a cropping box with just one finger
- Tap the screen to clone an object
- Move the cloned object in either 2D or 3D space
- Deleted cloned objects and start over
To achieve this experience, we used a fairly complicated render pipeline in Lens Studio. At a high level, it works like this:
- A perspective camera with device tracking builds a 3D map of the world, identifies horizontal planes, and renders what it sees to a render target.
- An orthographic camera starts with the image from the world camera and overlays the UI onto it before rendering to a separate render target that’s used for Live views—this way, the UI won’t be seen when watching a recorded Snap.
- When the box is tapped, a screen crop texture crops the capture target and copies the frame, saving it to another texture is used to create our cloned object.
- The freeze frame texture is then fed into our SnapML model to create an output texture, which will become the alpha channel masking out the background around our object.
- A custom material graph combines the original freeze frame texture with the alpha mask to produce a texture that contains only pixels belonging to our cloned object.
- That material is then applied to a Screen Image (in 2D mode) or a Mesh in the Cutout prefab by Snap (in 3D mode).
- Touch and manipulation controls allow a user to then move the cloned sticker around the scene.
While things can get complicated at times, Lens Studio and SnapML are extremely powerful tools, and we had a lot of fun creating this experience!