With the advent of neural networks, machine learning has gained immense popularity, and companies in just about every industry have started to apply some form of this vast technology to increase efficiency, improve throughput, or enhance customer experiences.
Artificial intelligence as a field has seen major breakthroughs in many areas within the past decade. With so many industries jumping towards automation and trying to apply AI to enhance customer experiences, it’s started to create a bigger impact in our day-to-day lives.
Being used on such a large and varied scale, it has recently come to light that these methods come with their own problems.
This article asks an important question: whether the machine learning models we use are intrinsically flawed or not.
Table of contents:
A brief history
Adversarial attacks can be defined as a machine learning technique that attempt to fool models by supplying them with a ‘defective’ input. They can be considered analogous to an optical illusions for machines.
It was a concept first introduced in a paper authored by Google AI researchers Christian Szegedy, et al in 2014. The techniques that were demonstrated in the paper were eye-opening to how one of the most commercially valuable and highly-anticipated areas of the deep learning came with its own problems and could potentially be undermined.
Adversarial attacks are most observable in computer vision models, and thus, most research has taken place with famous model architectures like AlexNet and LeNet.
So, to prove how dangerous (to the point of hilarity) adversarial attacks are, I’ll provide some examples of adversarial attacks on some commonly used computer vision systems across industries. Later, we’ll also briefly review attacks on audio processing ML systems and defenses that can be used against these kinds of attacks.
How susceptible are computer vision models?
In 2014, a group of researchers at Google and NYU found that it was far too easy to fool conventional CNNs with a carefully-constructed “nudge” to an input. In the image given below, we can see that just by adding some form of noise, this classifier identifies the given image of a panda as a gibbon with an accuracy of over 99%!
To the naked eye (or the human brain), both the images are easily identifiable as pandas. In fact, if one looks closely, we can even see the noise that’s been added to the image on the right. However, a CNN would be pretty sure that the right image is that of a gibbon.
So, now that we know that we can fool computer vision models by essentially adding noise, what are the real world implications of this? A recent paper by researchers at KU Leuven in Belgium showed that we could easily fool systems that deployed YOLOv2 to track people by printing out a patch of noise (similar to that in the above example) and holding it in front of them.
Printing this pattern out on a shirt and moving around would essentially make a person invisible to systems that track people (most video analytics solutions make use of this)—or would it? According to David Ha, a research scientist at Google Brain, this wouldn’t work with any system that uses a different variant of YOLOv2. But it feels good to know that a single 16×12 sheet of paper can make you invisible to some systems.
Another nefarious use-case of adversarial attacks (or adversarial patches) can be seen in a more commonly-used subclass of computer vision—facial recognition. Research carried out by Sharif et al. (2016) showed that one can fool facial recognition models by constructing glasses that not only conceal your identity, but can also make you someone else altogether.
Known as a targeted adversarial attack, the researchers at Carnegie Mellon University showed that they could manipulate a misclassification to a specific output. The researchers were able to fool facial recognition systems into mistaking them for celebrities by using specially crafted glasses.
So much for identity concealment and avoiding detection—but a certain question that arises after looking at these examples. Can be unknowingly dangerous? The short answer: YES.
For instance, with the improvements in computer vision, there has been an increase in the hype around self-driving cars. Companies like Tesla have already made it clear that they employ computer vision techniques for most major functionalities associated with autonomous driving — from lane detection to object detection to street sign identification and beyond.
The image above is a clear representation of how wildly bad it could case be if this was to happen in real life. A STOP sign being read incorrectly by a self-driving car would be devastatingly harmful to a lot of people in the immediate proximity.
Fortunately, there has been some cutting-edge research on how to avoid adversarial attacks, and everyday there is progress being made on how to train models better, and make them more robust to sustain and overcome such attacks. This has also been briefly discussed at the end of this article.
Adversarial attacks in audio-based systems
Researchers at the University of California demonstrated in 2018 that it’s possible to add the sound-equivalent of an adversarial patch to a soundwave (which can also be called an adversarial perturbation), which would be capable of completely changing speech-to-text transcriptions, or even go as far as concealing speech information in other audio types (like music).
Since almost every household now has either a Google Assistant, Alexa, or Siri always listening, these types of attacks can completely compromise voice assistant systems. The same researchers showed that by introducing white noise to YouTube videos, one can secretly activate AI systems on phones and smart speakers to unlock doors, transfer money, or buy stuff online.
The animated series South Park also has an entire episode built around voice recognition assistants hurling out obscenities due to voice commands.
Attempted defenses against adversarial examples
Traditional techniques that are used to make ML models more robust (like weight decay and dropout), usually fall flat when it comes to dealing with adversarial examples. To combat these types of attacks, researchers have come up with entirely new methods to ensure robustness against adversarial inputs.
Adversarial Training can be defined as a brute force solution, where the model is explicitly trained to not be fooled by adversarial inputs by including them in the training data. It is a developing field of research, and new methods are being discovered daily to integrate adversarial training to the regular training process more effectively. This recent paper by Xie et al. shows a take on adversarial training with smooth approximations.
Another method that has been used to increase robustness of ML models is Defensive Distillation. This is a strategy where the model is trained to give output probabilities of different classes, rather than hard decisions about which class to output.
Adversarial examples show that many modern machine learning and deep learning algorithms can be broken in surprising ways, and that the problem is inherent.
These failures are clear examples of how even simple algorithms, when provided with specific inputs, behave very differently than what the designer intended.
The ML community, which is growing every day, has started to get involved and design methods and techniques for preventing adversarial examples, in order to close this gap between what designers intend and how algorithms work.