Hardware acceleration for machine learning on Apple and Android devices

Recently, Apple showcased its new iPhone lineup, launching the iPhone XS, XS max, and XR. The most remarkable feature Apple included in this new line of devices is their smartest and most powerful chip ever — the A12 Bionic SoC (system on a chip).

Apple claims it has an 8-core Neural Engine dedicated to advanced, real-time machine learning applications. Similarly, Google has improved the Neural Net API in its Android framework in the latest OS: 9.0 Pie. Both of these features are responsible for something called hardware acceleration.

What is Hardware Acceleration?

According to Wikipedia:

In short, hardware acceleration is the process of employing hardware to allow software to run more efficiently on a computer.

Why do we need it?

The computational power of our CPUs has steadily increased, which has allowed for the development of more CPU-intensive software needed to perform hundreds of thousands to millions of calculations per second. This evolution has included high-definition graphic games, video editing applications, and, in the last few years, machine learning.

Some of our software applications have become so CPU-intensive that CPUs alone are not able to handle them. Very quick changes of millions of pixels on the screen along with running the entire operating systems on just a few CPU cores has become untenable.

Enter Graphics Processing Units (GPUs). They consist of thousands of cores and allow the parallel processing of multiple calculations. GPUs have given rise to a whole new level of computing used to accelerate graphics intensive software applications.

In recent years, as machine learning has become more popular, we’ve started to make use of GPUs to accelerate machine learning tasks (like training machine learning models with huge amounts of data, testing, and making predictions), decreasing computation time by taking advantage of its multi-thread processing.

Easy and wide availability of GPUs on almost every modern day computer has made it the best option for hardware acceleration.

Moving from Desks to Pockets

Machine learning is currently transitioning from PCs to mobile devices, and more effort is being put in to get Machine Learning Applications working efficiently on mobile phones to solve problems.

Companies have started to develop specific hardware as well as software tools and frameworks like TensorFlow Lite, Core ML, Caffe, etc. to give mobile developers easy entry into machine learning.

Employing machine learning on-device has a few benefits: it makes it more accessible to the general public much easier to use.

For example, on-device ML could allow you to apply filters on images wherever you are or predict categories and conditions of flowers you see and smell. You obviously wouldn’t want to carry your laptop everywhere to do all that.

However, mobile phones have much less powerful CPUs and GPUs as compared to desktop computers and laptops. Making predictions on these lower-power processors takes a few seconds, which isn’t optimal for real-time applications.

In late 2017, major chip makers like HiSilicon, Qualcomm, and Apple realized the potential of AI and ML on mobile and started increasing its resource allocation to this area. They started making their GPUs more powerful and started integrating Neural Processing Units (NPUs) dedicated to on-device machine learning.

Neural Processing Unit (NPU)

According to Wikichip:

Integrating Neural Net Processing units in mobile chips enables faster and more power efficient processing of neural networks on-device. There are several advantages of carrying out such computationally intensive tasks on-device:

The presence of an NPU on the device itself won’t require any dependence on cloud services, thereby cutting server-side costs as well as speeding up the process, and making machine learning accessible entirely offline which will enable users with low internet connectivity to take complete advantage of machine learning.
Not depending on the cloud also means that all code will be executed on the device, making it much more secure

Neural net processors consist of neural networks creating a brain-like computer that mimics millions of human brain neurons and synapses.

Such an implementation allows complex convolutional neural networks (CNNs) to perform multiple calculations in parallel to quickly recognize and analyze images, audio, video, and text.

Apple (with their neural engine), Google (with their Neural Net APIs), and Qualcomm (with their Snapdragon Neural SDKs) have taken big first steps in bringing some form of hardware acceleration to their platforms.

Apple’s A12 Bionic Chip

Apple released their first chip with its neural engine in 2017 — the A11 Bionic. It was the best chip ever made in a smartphone until it was replaced by Apple’s new chip released with the iPhone XS — the A12 Bionic. It also happens to be the world’s first 7-nanometer chip.

Along with a hexa-core CPU and a quad-core GPU, they have included an 8-core neural engine, which is dedicated to neural networks. The neural engine allows Apple to implement neural networks and machine learning in a more energy-efficient manner.

Apple claims that the A12 Bionic could perform up to a massive 5 trillion calculations per second, and the addition of 6 extra cores to the neural engine has made Core ML up to to 9 times faster than it was on the A11 bionic.

Apple wants developers to use Core ML and the A12 Bionic’s power to develop new and innovative ML applications.

Apple lays out some examples by harnessing the power of its A12 Bionic’s neural engine along with another hardware accelerator just for image processing — the Image Signal Processor (ISP)—to implement a super fast Face ID (iPhone’s secure 3d face unlock), Animoji and Memoji with 3d real-time face tracking, and Augmented Reality (AR) applications and games.

This ISP processes images that were taken by the camera and makes them look beautiful and realistic, and it enables advanced modes such as smart HDR and Bokeh.

For the past few years, Apple has been the leading chip maker, mainly because its hardware is closely integrated with its software. Even so, they aren’t the only ones developing chips suited for on-device machine learning.

Huawei’s HiSilicon Kirin 980

HiSilicon’s Kirin processors are second best in the market right after Apple’s chips and the best option so far for the Android market. The Kirin 970, released in 2017, is Huawei’s first mobile AI computing platform.

According to Huawei,

The Kirin 980 has been announced by Huawei, but it hasn’t yet launched in any smartphone. There are discussions that it may be as good or maybe even better than the Apple’s A12 bionic. The Kirin 980 has features very similar to the A12 bionic.

According to the announcement, the 7 nanometer(nm) Kirin 980 chipset, with the power of its dual-core NPU, can quickly adapt to AI scenes such as face recognition, object recognition, object detection, image segmentation and intelligent translation. To do so, it uses a dual-core NPU achieving 4500 images per minute, which would constitute a 120% improvement in recognition speed.

Kirin Chips are the Apple equivalent for the Android market, and these announced advancements could bring in revolutionary changes for Android developers.

Google’s Neural Networks API and Image Processing Unit (IPU)

Google has always been more of a software than a hardware company. It’s phenomenal to see how Google has achieved several benchmarks while relying on software advances. Taking advantage of the large amount of data it has, Google is using machine learning in almost every product it has.

Along with the release of Android’s 8.1 Oreo in 2017, Google released their Neural Networks API (NNAPI). Given the range of Android devices, Google has its software running on a wide variety of chipsets. To catch up with the desktop-to-mobile AI transition, it released the NNAPI, which allows Android to take advantage of available hardware to accelerate machine learning.

According to Google:

Google has shown remarkable results with its machine learning software capabilities, such as its Pixel’s AI photography, the Google Assistant for Natural Language Processing (NLP), and Google Lens (image recognition assistant).

In 2018, Google released Android 9.0 Pie, which seems to have machine learning integrated into the entire operating system. It uses machine learning to predict which apps you may use at a particular time, analyzes usage statistics to make the OS adapt to you, and much more.

Qualcomm Snapdragon

It’s impossible to leave out the most popular chipmaker in the Android market. Even though Qualcomm is a leading hardware company, it has developed the Snapdragon Neural Processing Engine (NPE) SDK that allows mobile developers to take advantage of the hardware acceleration facilities offered on a couple of their Snapdragon chipsets, which include: Snapdragon 845, 835, 821, 820, 660, and a few others.

According to Qualcomm,

Conclusion

2017 was the year when all the major chip makers in the industry started adding hardware accelerators (like the Neural Processing Unit) to their processors to enable on-device machine learning. In 2018, we’ve seen the first iterations of these chips, and it’s interesting to see the amount of progress that has been made so soon.

So far, on-device machine learning is most visibly being used for facial recognition, image and audio/video processing, etc.

However, machine learning has seemingly limitless potential in the future that’s yet to be explored. It’s truly an exciting time to be involved in the field.

As a software developer, you may not be able to integrate your own AI hardware. However, you can use software to add amazing capabilities to your applications.

Discuss this post on Hacker News and Reddit