Though it didn’t get a ton of stage time during the keynote at this year’s WWDC, there’s a lot to be excited about in the latest iteration of Apple’s machine learning framework, Core ML 3.
In this post, I’ll highlight the biggest changes to the software and discuss their implications for developers and machine learning engineers.
On-device training is here
The biggest addition to Core ML is the introduction of on-device training. Prior to this release, Core ML supported inference only.
Training was done server-side with a framework like TensorFlow or PyTorch, and models were converted to Core ML in order to make predictions in an app. Core ML 3 changes that, becoming the first widely available machine learning framework to support both inference and training directly on-device.
Training is accomplished with a combination of mlmodel attributes and API calls. An isUpdatable flag can be added to models and individual layers to denote where parameter updates can be made.
Definitions for training inputs, outputs, and loss functions, as well as hyperparameters like learning rate, can be set on the mlmodel’s proto file.
Developers can use the MLUpdateTask to perform training itself. Apple provides MLUpdateProgressHandlers that act like Keras callbacks, allowing arbitrary code to be run at various times during training, such as when an epoch ends. Because some updates may take longer than a few seconds, Core ML tasks can now be run in the background for a short period of time.
Apple telegraphed their intended use case for on-device training by talking almost exclusively about “model personalization”. Rather than using a one-size-fits-all model for everyone, developers are encouraged to tailor models and experiences to individual users.
In a demo during the Core ML breakout session, an Apple engineer created a shortcut feature for teachers using a paper grading app. A teacher marking an assignment with an Apple pencil could define custom sketches that would be recognized by a model and transformed into emoji stickers.
New shortcuts were made by drawing a few example sketches, which were used to train a small k-nearest neighbors model to predict the desired emoji to output.

While model personalization is one use case for on-device training, it’s not the only one. Recent research into distributed training techniques like federated learning have demonstrated the potential to train accurate models from scratch using small updates from many devices. Core ML 3 opens the door to implement these approaches to run on Apple devices and their powerful hardware.
Training models on-device has major benefits to both users and developers. Users get more personal experiences that improve over time. And because everything runs on-device, it works with or without internet connectivity.
Most importantly, though, user data stays safely on-device and is never transferred to a third party. Developers don’t need to manage large analytics clusters for training models or deal with transferring and securing training data.
Realizing the benefits of on-device training also requires some planning and technical choices. Mobile devices won’t have the same compute or memory resources as large cloud clusters, so it’s important to think carefully about what to update and when.
Rather than updating weights for every layer in a large model, Core ML 3 allows specific layers to be tagged as updatable. On-device training architectures should make use of global, static feature extractors and smaller, updatable blocks at the top of models for personalization.
New layers, more possibilities
In addition to on-device training, Core ML 3 also brings support for a number of new architectures, layer types, and operations that open the door for complex models and use cases. These updates aren’t always flashy, but they make a huge difference in the framework’s utility.
New models introduced in Core ML 3 include:
- NearestNeighbors.proto — Nearest neighbors classifiers (kNN) are simple, efficient models for labeling and clustering that work great for model personalization.
- ItemSimilarityRecommender.proto — Tree-based models that take a list of items and scores to predict similarity scores between all items that can be used to make recommendations.
- SoundAnalysisPreprocessing.proto — Built-in operations to perform common pre-processing tasks on audio. For example, transforming waveforms into the frequency domain.
- LinkedModels.proto — Shared backbones and feature extractors that can be reused across multiple models.
These new models enable use cases beyond computer vision and provide additional benefits for developers using multiple models in the same application.
Core ML 3 officially supports over 100 neural network layer types. As an appendix to this article, I’ve included a comprehensive list of new layers added to the NeuralNetwork.proto file. You can guess what most of them do from their names, but for detailed descriptions, check out this fantastic deep dive by Matthijs Hollemans.
The most exciting changes are support for many more NumPy-like operations for manipulating MLMultiArrays.
This will make it much easier to port complex pre- and post-processing into mobile apps. Core ML also now supports dynamic graphs, including loops and branchings.
The upshot of all this hard work by the Core ML team is that most of the state-of-the-art models making headlines over the past year are now fully compatible with Core ML.

Many of these architectures still need to be shrunk and optimized for mobile use (the weights for a fully trained BERT model can be over 1GB in size), but it’s exciting to see so many possibilities available.
Finally, the addition of new layers means that conversion tools have also gotten more robust. Converting models from Keras, TensorFlow, and PyTorch should be a much smoother process with fewer custom workarounds.
Implications
This release marks a major step forward for machine learning in the Apple ecosystem, and there are a number of implications for developers.
- Core ML is ready to move beyond computer vision. Image-related tasks have dominated deep learning, and specifically mobile deep learning, for a few years now. Support for audio-preprocessing, generic recommender models, and complex operations required for this year’s crop of NLP models promises to change that. Developers should start thinking about ML-powered experiences for users that go beyond the camera.
- On-device training will demand new UX and design patterns. How much data is needed to personalize a model with sufficient accuracy? What’s the best way to solicit training data from users? How often should this be done? As ML moves closer and closer to core application logic, developers need to think about how these features are communicated to users.
- Personalized models will need persistence and syncing. Training data for personalized models will remain on-device, but what about the model itself may need to be stored elsewhere? If a user deletes then re-installs an app or wants to use the same app on multiple devices, their personalization should go with them. Developers will need systems to back up and sync models.
- It’s now possible to do end-to-end machine learning and skip Python. Python has been the preferred programming language of ML engineers for nearly a decade now. With the ability to train models, Core ML + Swift is now a viable alternative for some projects. Will mobile developers skip Python entirely and opt for a language they already know? Time will tell.
Resources
For more information on Core ML 3, check out the following resources.
- An in-depth look at Core ML 3 — Matthijs Hollemans
- Core ML 3 Framework — Apple (Video)
- Core ML — Apple Developer Documentation
- Coremltools — Release Notes
Appendix — New Core ML 3 Layers
Control Flow Layers:
- CopyLayer
- BranchLayer
- LoopLayer
- LoopBreakLayer
- LoopContinueLayer
- RangeStaticLayer
- RangeDynamicLayer
Elementwise Unary Layers:
- ClipLayer
- CeilLayer
- FloorLayer
- SignLayer
- RoundLayer
- Exp2Layer
- SinLayer
- CosLayer
- TanLayer
- AsinLayer
- AcosLayer
- AtanLayer
- SinhLayer
- CoshLayer
- TanhLayer
- AsinhLayer
- AcoshLayer
- AtanhLayer
- ErfLayer
- GeluLayer
Elementwise Binary with Broadcasting Support
- EqualLayer
- NotEqualLayer
- LessThanLayer
- LessEqualLayer
- GreaterThanLayer
- GreaterEqualLayer
- LogicalOrLayer
- LogicalXorLayer
- LogicalNotLayer
- LogicalAndLayer
- ModBroadcastableLayer
- MinBroadcastableLayer
- MaxBroadcastableLayer
- AddBroadcastableLayer
- PowBroadcastableLayer
- DivideBroadcastableLayer
- FloorDivBroadcastableLayer
- MultiplyBroadcastableLayer
- SubtractBroadcastableLayer
Tensor Manipulations
- TileLayer
- StackLayer
- GatherLayer
- ScatterLayer
- GatherNDLayer
- ScatterNDLayer
- SoftmaxNDLayer
- GatherAlongAxisLayer
- ScatterAlongAxisLayer
- ReverseLayer
- ReverseSeqLayer
- SplitNDLayer
- ConcatNDLayer
- TransposeLayer
- SliceStaticLayer
- SliceDynamicLayer
- SlidingWindowsLayer
- TopKLayer
- ArgMinLayer
- ArgMaxLayer
- EmbeddingNDLayer
- BatchedMatMulLayer
Tensor Allocation / Reshape sort of operations
- GetShapeLayer
- LoadConstantNDLayer
- FillLikeLayer
- FillStaticLayer
- FillDynamicLayer
- BroadcastToLikeLayer
- BroadcastToStaticLayer
- BroadcastToDynamicLayer
- SqueezeLayer
- ExpandDimsLayer
- FlattenTo2DLayer
- ReshapeLikeLayer
- ReshapeStaticLayer
- ReshapeDynamicLayer
- RankPreservingReshapeLayer
Random Distributions
- RandomNormalLikeLayer
- RandomNormalStaticLayer
- RandomNormalDynamicLayer
- RandomUniformLikeLayer
- RandomUniformStaticLayer
- RandomUniformDynamicLayer
- RandomBernoulliLikeLayer
- RandomBernoulliStaticLayer
- RandomBernoulliDynamicLayer
- CategoricalDistributionLayer
Reduction related Layers:
- ReduceL1Layer
- ReduceL2Layer
- ReduceMaxLayer
- ReduceMinLayer
- ReduceSumLayer
- ReduceProdLayer
- ReduceMeanLayer
- ReduceLogSumLayer
- ReduceSumSquareLayer
- ReduceLogSumExpLayer
Masking / Selection Layers
- WhereNonZeroLayer
- MatrixBandPartLayer
- LowerTriangularLayer
- UpperTriangularLayer
- WhereBroadcastableLayer
Normalization Layers
- LayerNormalizationLayer
 
				 
	
Comments 0 Responses