Selfies are an art form. If you’re going to put your face out there, it’s got to look good. I’ve seen plenty of apps claim to offer the best filters, stickers, and retouching options, but they most of them just fall flat. The Portrait app by Img.ly is one of the few that offers something truly unique.
They use machine learning to automatically extract our face from the background in order to create awesome, poster-like styles that look different than anything else out there.
This week we caught up with Malte Baumann, one of the developers behind Portrait. To me, one of the most impressive aspects of his work is that it was done before Apple or Google released Core ML or TensorFlow Lite. You can read more about the process here.
What’s your background and how did you come up with the idea for Portrait?
I’ve started developing for iOS in 2012 and came to deep learning through my masters thesis on image segmentation in 2016. At img.ly we’re naturally concerned about edge deployment, so we decided to bring my image segmentation research to mobile devices, as that was quite an interesting challenge at that time and still is today.
The whole idea around actual portraits or selfies arose from the need to limit the data to a certain domain. This allowed us to keep our model size within a mobile apps limits and maintain a real-time experience.
What’s does your tech stack look like and what tools did you find helpful?
We do all our training in TensorFlow and use Docker to capsulate environments. The processing pipeline on the device itself is implemented using a combination of Metal Performance Shaders, some low level algorithms in C++, and of course the PhotoEditorSDK which uses CoreImage. The most helpful tool was the Xcode console I guess, as most debugging consisted of inspecting intermediate values.
What was the hardest part?
Porting the TensorFlow graph to MPS and building a real-time deep learning pipeline on iOS was by far the hardest part. While the underlying network doesn’t use the most complex operations, convincing all MPS layers to replicate TensorFlow’s padding behavior and calculations took quite a while and is extremely cumbersome.
We could see that the padding was off by a pixel at layer X, but every little change propagated through the whole graph, which then lead to new issues.
Along that our pipeline had to do as many inference runs as possible, but any images coming from the camera had to be rendered with at least 30FPSs and no delay, as that would have significantly downgraded the experience. It took a lot of work, but felt great when everything finally clicked together.
Do you have any advice for other developers who are looking to get started with machine learning?
There are lots of free high quality courses available online and I’d definitely recommend doing one of these to get started. From my experience, quickly picking a personal project afterwards greatly helps with keeping focused.
If you immediately apply your new knowledge you’ll quickly get dragged deeper and deeper into the field even after you finished the course. And with CoreML, which allows rather easy deployment to mobile devices, the results become even more accessible and fun to work on.