While flashy deep learning research grabs headlines, what happens to models after they are trained is equally important. To build a great product, you need to plan for the entire lifecycle of machine learning models, from data collection and training to deployment and monitoring.
This becomes even more critical when deploying ML models outside of the cloud, directly in mobile apps where you face the unique challenges of supporting multiple platforms, hundreds of chipsets, and billions of installs.
The good news is that the same best practices used to create lovable experiences still apply when neural networks are involved. Your tools, though — version control, continuous integration, monitoring, security, etc — need to be built with mobile machine learning in mind.
Machine Learning Models Lifecycle
Below are 7 stages of a model’s lifecycle you need to manage in order to deliver reliable, scalable mobile experiences.
Gathering an initial dataset is the first step in any machine learning project regardless of where the results will be deployed. When targeting mobile devices, though, think carefully about the conditions in which applications will be used and augment training data accordingly.
A model may achieve high accuracy on the bright images from the ImageNet dataset, but perform poorly in low light settings encountered by smartphone users. Augmenting your data by dimming, blurring, and adding noise to images or collecting them directly from mobile cameras will boost performance of models in production. Smartphones come equipped with a large array of high quality sensors you can use to create your own proprietary data.
Continue collecting data even after you app is deployed. Capture inputs and outputs of models running on devices and monitor accuracy to improve models over time. Make sure to respect storage, bandwidth, and connectivity limitations on devices.
You don’t want to fill up storage with cached images or deplete a data allowance streaming video back to the cloud. In some cases you’ll need to filter out sensitive data for greater privacy.
Today, most model training happens in the cloud. Datasets are large, and optimizing hundreds-of-millions of parameters requires a lot of processing power. In the future, AI-specific mobile processors will enable training performed directly on mobile devices, keeping user data private and secure. For now, a variety of cloud-based training platforms support exporting trained models directly to mobile-friendly formats like Core ML or TensorFlow Lite.
Regardless of where your model is being trained, the best results are achieved when the training environment matches deployment as closely as possible. Make sure to simulate optimizations like quantization inside the training loop to keep accuracy high. Finally, you need to keep track of metadata related to each model you train, including the datasets used, hyperparameters, the platform it’s targeting, and any other versioning information.
Optimizing models for mobile usage is critical to maintaining smooth, reliable user experiences. Milliseconds matter for cameras processing live video. To build models that run on battery-efficient mobile processors requires taking advantage of optimization techniques:
- Architecture: Choose an architecture specifically designed to perform well on mobile devices. (e.g. SqueezeNet and MobileNet for computer vision use cases)
- Pruning: Remove model parameters that aren’t important to reduce computation cost
- Compression: Quantize parameters by switching to fixed-point representations to shrink model size.
Chip manufacturers are constantly updating low level routines that perform machine learning computations. Model compilation should target specific chipsets to take advantage of additional optimizations. In many cases, this means creating multiple versions of models in platform-specific formats. Each new version needs to be tested on different platforms to measure results on different hardware.
Model optimization requires product managers and developers to make tradeoffs between size, runtime performance, and accuracy, all of which impact core user experience. Set clear priorities and establish requirements for each.
Models written and trained in server side frameworks and languages aren’t always compatible with mobile devices. Mobile platforms like iOS and Android require specific formats to take advantage of hardware acceleration.
Converting models for each target platform can be a tedious, fragile, and time consuming process that requires repetitive code. Mobile frameworks for machine learning are still in their infancy and standards can take years to adopt.
Proprietary datasets and model architectures are valuable intellectual property. Deploying models to cloud environments provides a high degree of security and protection. When models are deployed on mobile devices, however, the risk of getting access to your IP is much higher. Take steps to protect IP by encrypting or obfuscating machine learning models that run directly on devices.
Every mobile platform has a different set of APIs for integrating and executing models in an app. Making these methods and any pre- and post-processing consistent across platforms improves the maintainability of your code and reduces errors.
You also have to choose whether or not to bundle your models with your application code or download them at runtime. Bundling makes for a smoother user experience, but larger package size. Downloading at runtime gives more flexibility, but increases bandwidth usage.
When machine learning is powering core user experiences, model rollouts are feature rollouts. The best practices for shipping products still apply. Updates should roll out over the air at times when they won’t disrupt users, be released to a small fraction of devices to make sure performance is acceptable, and A/B tested in situations where user behavior might be impacted.
Don’t put a model in your app without a way to monitor it. Unlike cloud environments where you have robust logging infrastructure, deploying onto mobile devices often leaves you flying blind. You need a system for measuring runtime performance, memory usage, battery drain, and accuracy, and all of this across heterogenous hardware.
If your model is supposed to analyze video in real time, make sure it runs at 30 frames per second on all the target mobile devices. Set up alerts that notify you when there are significant changes to input data, predictions that may indicate a failure, or a change in the way people are using your application.