How next-gen AI accelerators will transform mobile machine learning

Introduction

In recent years, AI and (more specifically) machine learning have become so popular that the hype surrounding them has become contagious.

It’s becoming apparent that the widespread use of deep learning doesn’t stop at just computer scientists and programmers but has made its way into every domain, ranging from physics, biology, and chemistry to healthcare, transportation, and finance.

Even as the need for machine learning continues to expand across every part of our lives, there has recently been significant growth in its usage on smartphones and other mobile devices.

While many apps that utilize machine learning to recognize speech, faces, and gestures have been around for years, their rate of success has improved dramatically in comparison to their previous versions.

Just as machine learning is becoming more abundant by the day, the usage of mobile and IoT devices is also replacing traditional computers.

As smartphones receive phenomenal hardware updates year after year (at a rate faster than conventional laptops and notebooks do), many have overtaken their not-so-mobile counterparts in terms of processing speed and compute power.

As more and more mobile apps start to perform deep learning tasks on their back-end, there arises a need of doing so more efficiently and at greater speeds.

So where do smartphones meet machine learning ?

^Here’s one of the official definitions of machine learning. Many without a technical background or grasp of their concepts might associate this definition with a complex procedure involving collection and analysis of terabytes of data, building deep neural networks with some super-complicated coding, and then using a model to perform predictions and boost revenue for Fortune 500 companies.

But in fact, some of the simplest tasks can be performed with deep learning using just a smartphone and its camera.

Image classification

One of the most common uses of a machine learning model is to classify an image according to what it represents. This task of predicting what an image represents is called image classification. Such a model is trained to recognize various classes of images. For example, a model might be trained to recognize pictures representing four different types of fruits: apples, oranges, grapes and bananas.

When we provide a new image as input to this trained model, it will return a set of probabilities corresponding to the image representing each of the types of fruit it was trained on. An example of this might look like follows:

Pattern Recognition (Object Detection)

Another widely used use case for a deep learning model is for detecting objects within an image. Given an image or a video stream, an object detection model can identify whether or not one of a known set of objects might be present, along with the accuracy of its identification, and provide information about their positions within the image.

Such a model can be trained to detect the presence of multiple classes of objects. For example, a model might be trained with images that contain various pieces of fruit, along with a label that specifies the class of fruit they represent such as an apple, a banana, or an orange, and data specifying where each object appears in the image, along with the accuracy with which it attributes its detection to a particular class.

When we subsequently provide an image to the model, it will output a list of the objects it detects, the location of a bounding box that contains each object, and a score that indicates the confidence that detection was correct. This might look like the table on the left.

Natural Language Processing (Smart Reply/Auto Reply)

These kinds of NLP models are very noticeable, particularly on email applications such as Gmail, wherein they generate reply suggestions to conversational chat messages. For instance, TensorFlow’s smart reply model “generates reply suggestions based on chat messages. The suggestions are intended to be contextually relevant, one-touch responses that help the user to easily reply to an incoming message.”

Besides these, we have many other models that perform tasks like gesture recognition, speech recognition, image segmentation etc.

Predictive Analytics (Google Maps)

Predictive models are also an important use case for mobile machine learning. Take, for example, Google Maps’ model to predict parking difficulty.

According to Google’s AI blog, a standard logistic regression model is used wherein ground truth data is obtained by crowd-sourcing individual data to provide quantitative answers about parking times in specific areas, from which the model ultimately maps areas into easy, medium, and limited.

Apart from these, there are plenty of other applications of mobile machine learning, similar to their traditional counterparts.

How deep learning models are trained on a smartphone

As you might be aware, the process of building deep learning models can be broken down into two parts — training the model on some known data that might or might not have a label, and then testing it on unknown data (aka validation/test data) to obtain an inference.

Out of these two, training is actually the more computationally heavy process, because we are repeatedly teaching the model how to interpret the data.

Due to the nature of such heavy computation, training a model requires powerful hardware like multi-core GPUs, and even then it might take weeks to train a model with significantly large proportions of data. These days, most models are trained on the cloud on an external server.

On the other hand, mobile devices, due to their size restrictions, have limited compute capability, and though their power keeps increasing year after year and many have already approached or surpassed the computational power of desktops or notebooks, they still remain no match for the power of a server on the cloud.

However, the second part of the deep learning process, inference, doesn’t require such significant computational ability, and many day-to-day processes that we perform on our smartphones, such as facial or speech recognition are inference performed on the local smartphone processors.

But it would seem that new technologies are making possible what was hitherto unexpected. A few days ago, at WWDC 2019, Apple unveiled new capabilities in their upcoming Core ML 3 framework.

It extends the available model types in its predecessor while additionally incorporating support for on-device training. Besides adding around 70 new neural layers, the pre-trained models in Core ML 3 can be retrained using locally generated data from iOS apps.

This paves the way for new innovations such as federated learning. A new way of rolling out updates, a smartphone downloads the current model, improves it by learning from locally-generated data, and then pushes the changes as a small individual update.

Once this local update is sent back to the server, it is averaged with updates from other users to improve the shared model.

It’s all about compute power

Due to the number of resources required to perform any significant machine learning, these days all except the simplest models are trained on the cloud.

To train a model, simply give the cloud computer access to your training data, and after the training is done, just download the learned parameters for the model and discard the training instance if you wish.

But keep in mind that you do pay for using cloud servers to train models. Once you have a trained model, you can do whatever you’d like with it.

Not only does this make the entire procedure machine-independent, but there are also a growing number of providers for such services, like Amazon EC2, Google Cloud Platform, and Azure to name a few of the most popular ones.

ML-as-a-Service

Besides, due to the growing boom in AI, many companies are offering machine learning as a service. These services are customized as per the client’s needs—for example, speech recognition, text analysis, or image classification. Here are some examples of such service providers.

The good thing about using such services is that you don’t need to have any of the technical expertise to train models. Just upload your training data, select a suitable model, and the machine learning service usually manages the rest of the work.

However, the difference here is that you aren’t just utilizing the computing power of a server but actually “renting” a ready-made model.

As such, you don’t own it, and hence cannot download it and retain it for future use. Think of it more like a pay-per-use model.

Another worry to face is that the package comes as a whole and includes the inference part, which means you can’t perform it offline and need to use the provider’s API for it.

Besides, since you’re selecting one from a given number of models, there might always be the problem of not finding the perfectly-tailored fit.

The benefits of on-device machine learning

Performing AI techniques like deep learning on-device can get rid of several disadvantages that come with using a cloud-based service: slow inference speed due to sending/receiving data online, expenses for using a cloud service, concerns of privacy when data is leaving the device, and also total dependence on the cloud platform, which may fail.

Looking at the exponentially increasing computational power of mobile devices, coupled with upgrades in AI algorithms and related software, it isn’t too hard to imagine deep learning processes taking place locally on the device’s own hardware, without any external dependence. Such a scenario would not only reap a number of benefits but also open the door to many new possibilities.

Offline inference

Using a cloud-based service requires the constant exchange of information between the local device and a server, via an internet connection. It follows that performing ML tasks smoothly would require having constant and uninterrupted connectivity.

With mobile devices, this is harder to achieve and sometimes a loss of connectivity for a few seconds might mean having to start again from scratch.

But with on-device ML, developers can deploy models on any device at any given time, without having to bother with network connectivity, which makes applications not only more portable but also reduces cloud costs.

Besides, it isn’t always possible to obtain access to networks, wireless or otherwise—say, for example, within forest cover that shielded from network signal, or perhaps underground where such signals have a tough time reaching.

One big advantage of carrying out the whole process offline is the conservation of battery life for mobile devices, as they no longer have to waste energy on the transmission of signals to and from the network.

Cost cutting

Once it’s possible to perform complicated ML processes offline, it would also mean no more need for using cloud-based services, thereby saving quite a bit of expense required for their usage.

In particular, the hardware provided by these service providers for ML tasks, such as GPUs and AI chips, are some of the most expensive services that they offer. Besides, developers don’t have to worry about developing the extra infrastructure needed to interact with a cloud service, and are able to devote more resources to the actual task at hand.

Security and Privacy

Since data does not need to be sent to any external source for processing, personal data can remain on the device.

This may be a big deal when you’re working with sensitive data, such as bio-metrics in the form of fingerprints, iris scans, vocal data, etc. It also removes the risk of such data being hijacked during its transfer to/from the cloud.

No time lag

Because all the data is being processed locally and there’s no transfer between the device and a server, considerable time is saved and latency does not arise.

This can be particularly useful in tasks such as video streaming with a high frame rate, where lag compromises performance. On-device ML might also prove necessary in scenarios such as self-driving cars, where a vehicle needs to turn or brake and thus cannot afford a time lag.

Independence from external entities.

What happens when the provider of a cloud service breaks down or has an outage? It could spell disaster for the hundreds (or hundreds of thousands for the big guys like AWS) of services that depend upon it. An independence from cloud-based services resolves such a scenario, however rare it may be.

How the so-called AI chip factors in

In the early days of deep learning and neural networks, companies and even tech giants were using central processing units (CPUs), which are, in fact, the motherboards in personal computers.

Due to the tremendous computing power required to train deep neural networks, though, CPUs have proved to be inefficient and slow, needing lots of time and money to produce results.

Once it was discovered that GPUs could do the same job thousands of times faster, and that any student could train a deep learning network at home on a PC using a GPU faster than a company with several hundred CPUs, they became the mainstream hardware for performing such processes.

But in in recent years, companies have started building chips specifically designed for AI-related tasks. Take, for example Google’s Tensor Processing Unit (TPU), which is designed to specifically handle only such tasks and outperforms any mainstream device.

While Wikipedia’s definition above still classifies the AI chip more or less as a microprocessor, apparently it’s not that simple. Initially, these chips were nothing but hardcore GPUs with a slightly modified architecture and several upgrades.

For the longest time, Nvidia has monopolized the GPU industry. Since their original discovery of being especially suited to handle deep learning workloads, their design, originally developed for rendering intensive graphics, has proven very efficient for training massive amounts of data for ML.

There are also players such as Google, with its TPU and more recently, Graphcore and its IPU (Intelligent Processor Unit) chip.

Aside from usual scenarios such as object detection, biometrics, and image stabilization while taking pictures, the advantages of onboard AI chips are opening the door to a variety of possibilities.

A year ago, Google unveiled new advances with AI such as using the Assistant app to make real phone calls to, say, book an appointment. Not only did Google Assistant do an uncannily good job of asking the perfect questions, it also performed realistic “ooh”s and “er”s in all the right places.

In fact, during a demonstration, the person on the other end was left completely unaware that they just had a conversation with a machine, causing the crowd of attendees to question everything from its ethical to social impacts.

Similarly, Amazon’s AI-powered Alexa is the primary platform for the Echo and Dot smart speakers, which can turn off lights, read books, and order goods.

They also gather user data, using it to learn new skills about the user’s general behavior and predicting future requirements from them.

Among other things, Alexa can close doors, control lighting and air conditioning, order food, read audiobooks, and answer general questions.

Another great example is LG’s integrated Vision AI, which can do some impressive stuff, such as utilizing an image of a product from the camera to display all related information, including where to buy it cheaper or recommend similar products.

Here are some fancy names for the chip-sets that integrate them:

Huawei Kirin

In August last year, Huawei was one of the first to step into the AI chip market by announcing its newest system-on-a-chip, the Kirin 980, which boasted a 7nm mobile processor, one of the first built around ARM’s Cortex-A76 CPU and Mali-G76 GPU, with a Cat.21 smartphone modem supporting speeds of up to 1.4Gbps, complete with 2,133MHz LPDDR4X RAM.

While no larger than a fingernail, it contains around 7 billion transistors. It also contains two NPUs (Neural Processing Unit), which can perform AI-assisted image recognition tasks at around 75 different images per second.

More recently, Huawei also announced “the industry’s highest-performance ARM-based CPU” also featuring the same 7nm technology.” Dubbed Huawei Kunpeng 920, the new CPU is designed to boost the development of computing in big data, distributed storage, and ARM-native application scenarios.

Apple A12 Bionic

Apple’s take on the AI chip, also unveiled around the same time, has six CPU cores, four GPU cores (supposedly twice as fast as the ones in the A11), along with an updated Neural Engine, which is the part of the chip used for handling AI tasks.

The Neural Engine, contains eight cores and can perform 5 trillion operations per second, causing apps that utilize the company’s machine learning framework, Core ML, to run up to ten times faster than previously.

Qualcomm Cloud AI 100

A couple of months ago, Qualcomm announced a “whole new signal processor that we designed specifically for AI inference processing” at their AI day conference in San Francisco. And while the name may sound contradictory, Qualcomm asserted that the chipset was tailor-made for edge computing.

While available in a number of different modules, form factors, and power levels, it integrates a full range of developer APIs, including compilers, debuggers, profilers, monitors, servicing, chip debuggers, and quantizers. It also features built-in support for machine learning frameworks like Google’s TensorFlow, Facebook’s PyTorch, Keras, MXNet, Baidu’s PaddlePaddle, and Microsoft’s Cognitive Toolkit.

Google Coral Dev Board with on-board TPU

Not exactly a chipset, the Coral Dev Board is in fact a Single-Board-Computer, much like the Raspberry Pi. Its selling point is that it features a removable system-on-module with one of Google’s custom tensor processing unit (TPU) AI chips.

Google also showcased their Coral USB Accelerator, a $74.99 USB dongle designed to speed up machine learning inference on existing Raspberry Pi and Linux systems.

While TPUs have so far been available for use with platforms such as Colab, having your very own TPU for offline machine learning tasks will probably be a welcome prospect for many developers.

These Edge TPUs send and receive data over USB and aren’t quite like the chips that accelerate algorithms in Google’s data centers — those TPUs are liquid-cooled and designed to slot into server racks, and have been used internally to power products like Google Photos, Google Cloud Vision API calls, and Google Search results.

Edge TPUs, in contrast, are around the size of a coin and can handle calculations offline and locally, supplementing traditional microcontrollers and sensors.

While they may not be great at handling massive training datasets, they do run inference (testing) with the more lightweight version of TensorFlow designed for mobile devices: TensorFlow Lite.

AWS Inferentia

Aiming for a late 2019 release, Amazon is making sure their Inferentia chip has its place in the AI chip race. With built-in support from popular AWS products like EC2, SageMaker, and the new Elastic Inference Engine, the chip will provide hundreds of tera operations per second (TOPS) of inference. Also, more than one Inferentia chip can be integrated together to speed up tasks.

Conclusion

Most smartphones currently sold have at least one AI-enabled feature, such as intelligent imaging, facial recognition, or a voice-activated personal assistant.

It’s estimated that by 2022, approximately 1.2 billion smartphones will be manufactured with some form of AI capability, and that this number will represent at least 75% of the total market for smartphones.

With such a whopping number of devices making use of AI-powered apps, a new generation of AI accelerators becomes more of a necessity than a fad. Dedicated AI cores are also expected to be present in most mid-range smartphones by the end of 2019.

Chip manufacturers are starting to acknowledge the shift in AI processing from the cloud to local systems, bearing in mind the contributions to privacy, efficiency, and speed.

While some can choose to boost performance with features such as better graphics, dedicated circuitry for AI, and larger amounts of cache memory for high-speed data access, other manufacturers, choosing to keep chip design smaller and less expensive, instead introduce various optimizations to stay in the race.

For example, very recently, ARM unveiled their Cortex-A77 CPU and Mali-G77 GPU for premium smartphones releasing in 2020.

While the GPU does not have any dedicated portion conserved for machine learning, ARM’s optimizations and tweaks give a significant boost to frameworks like TensorFlow.

As of now, the boom in AI chips and neural network cores seems to be more a result of a bunch of competitors trying to showcase their best efforts in outdoing each other.

While what we see now is likely the first in a new generation of processors, with machine learning and AI becoming more commonplace day-by-day, they might soon become as indispensable as any other standard hardware units.