With fast-paced advances in neural network architecture, deep and machine learning research, and ever-increasing hardware + software resources, the number of incredible demo projects seems to increase at a near-dizzying rate.
From AI-generated art and enhanced accessibility, to tracking human movement in real-time, and beyond, we’ve curated some of our favorite deep learning projects with accompanying visual demos.
While there are undoubtedly countless more projects we could find and highlight, hopefully this list gives you a high-level view of what researchers, practitioners, and even artists are creating with machine and deep learning in 2019.
One interesting (and perhaps not surprising) note is that many of these demos use generative adversarial networks to create these visually-appealing demos: they naturally make for great visual effects, and it would be great to see more of these experiences in production-level applications.
To supplement the demos themselves, I’ve tried to provide any linked resources (papers, code, project pages, full videos, etc.) where available. If you have an awesome demo you think we should add to this list, let us know in the comments!
WikiArt with style transfer + StyleGAN
Here, we essentially see what’s possible when combining style transfer with a StyleGAN. Allows us to apply the style from a reference image directly onto latent space images.
For at least the GANs part of this project, Gene forked and used NVIDIA’s repo for the progressive growing of GANs:
3D Pose Estimation in Unity
Blending 3D pose estimation with 3D dev platforms and rendering engines like Unity allows for fascinating AR projects like this one. By combining these two powerful technologies, AR objects can more accurately follow human movement in 3D space.
Architectural machine translation
This project takes video frames of objects as input and outputs modernist architectural renderings. Such an interesting application of machine translation.
Here’s a starting point for the code — couldn’t find the direct source code, so if anyone knows where to find that, leave me a note in the comments:
Removing cars from images and videos
Chris might not be as impressed as we are with this demo. A Vanilla Sky-esque project that masks moving and parked cars on a city street via a vehicle detection network, with an AR blurring effect added to those detected vehicles.
Here’s a detailed overview that discusses the what, why, and how of this project.
Translating images to unseen domains with a GAN
— Ming-Yu Liu, NVIDIA
From the abstract:
Drawing inspiration from the human capability of picking up the essence of a novel object from a small number of examples and generalizing from there, we seek a few-shot, unsupervised image-to-image translation algorithm that works on previously unseen target classes that are specified, at test time, only by a few example images. Our model achieves this few-shot generation capability by coupling an adversarial training scheme with a novel network design.
Infinite Patterns
Alex works at Google and is the creator of DeepDream, a computer vision program that uses a neural network to find and create patterns in images. These hyper-processed, infinite loops are often dreamlike, or even hallucinogenic. I almost think of these as moving wallpaper. He’s got a few of them on the Twitter thread below, so make sure to follow the link to check them out.
This article over at Experiments with Google explores this project and discusses how it ended up as a collaboration with Pinar&Viola, a digital art duo:
Full 3D home try-on from a single image
Recently, we’ve seen a surge in interest in try-on experiences—retailers like Gucci are exploring ways to allow their users to engage with their products from the comfort of their own homes (or on the subway, or at work — you get the idea).
But those experiences are only as good as the 3D representations underpinning them. This project introduces “Pixel-aligned Implicit Function (PIFu), a highly effective implicit representation that locally aligns pixels of 2D images with the global context of their corresponding 3D object.”
GANs + Pixel Art
An interesting AI art project that combines a GAN on an infinite loop, trained on the eBoy dataset. For those unfamiliar, eBoy creates reusable pixel objects and uses those to create art, make toys, and more.
Trajectory Prediction
— Posted by deeplearning.ai, from research out of Carnegie Melon, Google AI, and Stanford
Predicting where people are going to move is a fascinating application in the realm of human activity tracking. Use cases abound, from understanding customer behavior in retail, to analyzing crowds, and more. Taking this a step further, this demo includes predictions on the nature and context of a given person’s activity (i.e. transport, work, etc).
Orange juice as a lens into an AR world
— キヨ
An amazing look at how augmented reality can be used to blend imaginative digital worlds with objects in the real world. Wasn’t able to find any of the underlying code or a project page, but this demo shows the potential AR + ML has in unlocking these kinds of imaginative, artistic experiences.
A model learning to forget a face
— posted by Joseph Reisinger
The quote in the Tweet below summarizes this project pretty well. We see a lot of demos that show a neural network generating new, photorealistic people and images. But here, we see the process in reverse — moving from a generated portrait to a deconstruction, as the network’s neurons get switched off one-by-one.
Body part segmentation with TensorFlow.js
— Google Developers
From Google I/O 2019, this demo from the TensorFlow team showcases real-time movement tracking with image segmentation. To make this experience work in real-time, they run two body-part segmentation models, match them up, run dynamic time warping, and then play and encode a video. Featuring GPU acceleration with TensorFlow Lite.
Turn yourself into a 3D avatar with pose estimation
— 青絵
This experience combines pose estimation and AR to transform a user into a monster (or ostensibly any 3D avatar character). This demo is transformative in the most literal sense of the word. What’s perhaps most impressive is the accuracy and precision of the movement tracking—the avatar matches the user’s movement incredibly well. Also some cool UX stuff, including a gesture-based transformation mechanism, and a neat sound effect while the avatar is generating.
Tracking garbage and other sidewalk obstructions with object detection
What’s most impressive to me here is the nature in which the trash is detected — the speed at which the camera is moving, the instantaneous detection. This kind of application has the potential to change how we think about so-called “Smart Cities”.
Paint photorealistic landscapes with GANs
— posted by Kevin Lim
If you watch this demo, you’ll notice that the human-made drawing on the left look like something you might have seen in older applications like MS Paint. To see those coarse landscape representations generated and rendered as photorealistic landscapes is something to behold. This is the kind of application that would trick me into thinking I’m actually an artist!
Neural network time lapse (GAN)
— posted by Spiros Margaris
Struggled to find who is actually behind this project, but its surreal and iterative nature was incredibly appealing.
From the YouTube video description:
On the left is the source face, Theresa May. The next column is the program learning her face and rebuilding the picture of her using it’s learned model. Next up is the destination face and the model it is building up. Then finally on the far right, fifth column (lol) is a recreation of Theresa May’s face but matching the position and expression of the destination face.
Inverse scene rendering from a single image
From the paper’s abstract (linked below):
We show how to train a fully convolutional neural network to perform inverse rendering from a single, uncontrolled image. The network takes an RGB image as input, regresses albedo and normal maps from which we compute lighting coefficients.
ARKit 3 body segmentation with particle effects
Another really appealing visual demo of blending AR effects with deep learning. Here, Laan Labs (a boutique ML/CV shop specializing in edge tech) applies a dissolving particle effect on top of a people segmentation model.
Real-time finger detection with YOLO
Andrew’s done an excellent job of describing what’s going on under the hood in this demo. Impressive real-time results on iOS. And as Andrew mentions, lots of possibilities to add onto this baseline experience—AR, tracking, etc. Finger puppets? Piano lessons? Love the potential of this on-device demo.
Generating text in a mobile app with GPT-2
The folks over at Hugging Face have been making some incredible strides with transformers and other NLP architectures. And not just server-side — they’ve also been working on model distillation in efforts to embed these incredibly powerful language models on-device. This demo looks specifically at text generation/autocomplete.
Weight agnostic neural networks
— hardmaru
The idea of a weight agnostic neural network is incredibly compelling, and it leads us to question how important weight params are when compared to underlying architecture. The abstract does a nice job of teasing out this dynamic:
Not all neural network architectures are created equal, some perform much better than others for certain tasks. But how important are the weight parameters of a neural network compared to its architecture? In this work, we question to what extent neural network architectures alone, without learning any weight parameters, can encode solutions for a given task.
MediaPipe: a framework that combines deep learning and traditional CV pipelines
— Google AI, posted by Dimitri Diakopoulos
MediaPipe is Google’s relatively new pipeline for combining traditional CV tasks with deep learning models. This new framework really opens the door for more immersive and responsive AR experiences.
Full 3D pose estimation: body, hands, and face
— CMU, posted by HCI Research
This project represents the first method for capturing total 3D motion using a monocular view input. The technique generates a 3D deformable mesh model, which is then used to reconstruct total body pose. The “total” part of this equation is the most impressive to me, just from a visual standpoint. The ability to reconstruct pose for face, body, and hands in this way enables a truly stunning demo.
Underwear detector
Totally understandable, Nick. But in all seriousness, ML-powered NSFW filters are an incredibly useful application for moderating user-generated content.
BERT, running completely on-device
Above, we discussed Hugging Face’s continued efforts to bring the most powerful language models to edge devices like smartphones. This demo look at an implementation of question answering with BERT on iOS.
“Latent History”: Collective photographic memories
An amazing visual demo that evokes both a deep sense of and appreciation for history with modernity. VentureBeat’s profile of this project (linked below) explains this well:
This piece generates imagery from a data set of 300,000 photos, including 150-year-old Stockholm city archives and colorful images taken from the same location within the past 15 years.
The resulting effect is an artistic rendering of collective memory blended with (roughly speaking) the present day.
Masking and image in-painting to remove phones from selfies
Is a mirror selfie without the phone in the picture actually a mirror selfie, or is it something else? Whatever it might be classified as, Abhishek Singh’s cool demo works in 3 steps:
- A segmentation model classifies each pixel belonging to the phone class of objects.
- A pixel-level mask is applied to the segmented phone.
- Image in-painting is applied to the segmented phone to create the blurred effect.
Generating entire videos with DVD-GAN
— Aidan Clark, Jeff Donahue, Karen Simonyan
Here we have a generative model that is able to generate video with high complexity and fidelity. This kind of video sample generation could be a game changer for the viability of synthetic datasets. There’s quite a bit of work around producing images with GANs (as showcased in many of the projects here), but producing high-quality video opens up a wide range of possibilities in terms of data generation, video synthesis, and video prediction tasks, among others.
Searching BigGAN’s latent space for a previously-generated image
The Tweet here says it all. What’s more, this thread includes more demo videos showcasing the project’s progress through additional iterations. Amazing to see all the different ways ML engineers are using GAN latent space.
Removing objects from sports with neural in-painting
In some ways similar to his mirror selfie project (showcased above). This one made me laugh — for some reason, it’s really entertaining to see world-class athletes chasing after an invisible ball.
Abhishek also provided a quick overview of the project structure:
“MaskRCNN trained on coco dataset to identify and segment objects -> mask them out and delete the pixels -> edge connect model trained on places2 dataset to fill in the missing pixels.”
HoloGAN: learning geometric representations of objects
Super impressive to see disentangled 3D representations coming solely from single-view 2D images.
From the abstract:
Our experiments show that using explicit 3D features enables HoloGAN to disentangle 3D pose and identity, which is further decomposed into shape and appearance, while still being able to generate images with similar or higher visual quality than other generative models. HoloGAN can be trained end-to-end from unlabelled 2D images only. Particularly, we do not require pose labels, 3D shapes, or multiple views of the same objects. This shows that HoloGAN is the first generative model that learns 3D representations from natural images in an entirely unsupervised manner.
Using pose estimation to automatically change text size in your browser
I’m a big fan of ML projects that are working to make the tools we use every day more accessible. Here’s one that changes the text size inside a browser window as a user moves closer or further away from the screen — in real-time.
“Gym City”: Sim City with neural networks
— Sam Earle
The GitHub repo for this project explains what’s going on here better than I ever could, so I’ll let it do the work here:
A Reinforcement Learning interface for variable-scale city-planing-type gym environments, including Micropolis (open-source SimCity 1) and an interactive version of Conway’s Game of Life.
Motion style transfer using GANs and a single reference image
— SVIP Lab
Motion imitation + style transfer is such a cool idea, and the unified framework presented in this project has a wide range of transformational possibilities.
The abstract contains an excellent high-level description of this project, including its points of differentiation from other methods:
Existing taskspecific methods mainly use 2D keypoints (pose) to estimate the human body structure. However, they only expresses the position information with no abilities to characterize the personalized shape of the individual person and model the limbs rotations. In this paper, we propose to use a 3D body mesh recovery module to disentangle the pose and shape, which can not only model the joint location and rotation but also characterize the personalized body shape. To preserve the source information, such as texture, style, color, and face identity, we propose a Liquid Warping GAN with Liquid Warping Block (LWB) that propagates the source information in both image and feature spaces, and synthesizes an image with respect to the reference.
Spiral, by DeepMind: Paint a portrait in 19 brushstrokes
— Yaroslav Ganin and DeepMindAI
Essentially, this project allows users to unconditionally generate images from the CelebA-HQ dataset in 19 steps (termed brushstrokes here). I really like the oil paint aesthetic here, as well.
Relighting portraits after they’ve been taken
— Hao Zhou, Sunil Hadap, Kaylan Sunkavili, and David Jacobs
I’m certainly not any kind of talented photographer, but getting the right lighting on pictures that I do take is consistently difficult. Thus, being able to automatically target a specific lighting level for a poorly-lit image is a really appealing idea.
And finally, in the spirit of Halloween…Jack-o-GANterns 🎃
Had to fit a pun in here. A fun and seasonal use of a GAN — a few of these look like designs I’ve tried (and failed) to carve over the years. Let the horror show begin.
Comments 0 Responses