Provide intelligence to mobile apps
In this article, we’ll discuss some foundational concepts in machine learning (ML) that are particularly important for mobile developers interested in working with ML.
Mobile devices provide four different input sources that can be used for machine learning. These sources are:
- Camera: ML can augment or analyze images and videos captured by the camera on the mobile device. We can use ML to detect landmarks, objects, or even faces in videos and images. ML can also recognize handwriting and printed text inside images. We can also track motion and poses, recognize gestures, understand emotional context, and much more.
- Text: Analyze and classify text to understand sentiment, meaning or structure of a sentence or phrase.
- Speech: ML can be used to translate/convert speech into text for dictation.
- Activity: ML can also leverage a device’s gyroscope, accelerometer, altimeter, magnetometer, and GPS to analyze the activity of a user.
Machine learning fundamentals
Before we dig deeper, let’s understand some fundamental concepts that we need to know to better understand ML. The first and the foremost building block of machine learning is the model.
A machine learning model is a combination of an algorithm that’s taught by a computer to perform a specific task, and the data that’s used by the algorithm to train itself.
We call this a model because it “models” the domain for a given problem. For example, while trying to identify faces of our friends in a given image, the problem domain is digital images of humans. The respective model corresponding to this problem domain will contain everything to make sense of these images.
The first thing needed to create a model is the algorithm. Then, we use this algorithm to train the model by showing it a large number of images related to the problem that we want to solve.
But creating a model can be quite tricky and resource-intensive, so the first step is often o acquire a pre-built model. Many such models can be easily found online and can be converted to a Core ML model format, using tools such as the TensorFlow converter and Core ML Tools.
In our example, the model would need images of our friends and all the things that we want the model to learn from those images, such as their names. Once we have both the data and algorithm we can begin training the model.
The training data must contain the correct answer, which is known as a target variable or target attribute. The learning algorithm finds patterns in the training data that map the input data attributes to the target (the answer that you want to predict), and it outputs an ML model that captures these patterns.
Once training is complete, a model contains the knowledge about the problem that was extracted by the algorithm from the images, and hence we can use it to find out answers to unknown questions. This is called inference.
After training, if the model is able to predict names of our friends for a given input image, then we can say that the model generalizes to the task we’ve provided it, as expected.
Types of Machine Learning
Machine learning comes in many different flavors, depending on the algorithm and its objectives. You can divide machine learning algorithms into three main groups based on their purpose:
- Supervised learning
- Unsupervised learning
- Reinforcement learning
The image below depicts difference between the three:
In this following sections, we will discuss more about supervised learning.
Supervised learning is the most common learning type in machine learning practice. Human intervention is required for supervised learning, as the algorithm needs training data that’s labeled. This process requires the model to be fed with labeled examples—in our case, labeled images of our friends.
These labels tell the model what or who is in a given image.
Types of Supervised learning
Supervised learning is categorized into the following two types:
- Classification: A classification problem is when the output variable is a category, such as “red” or “blue” or “disease” and “no disease”. A classification model attempts to draw some conclusion from observed values. Given one or more inputs, a classification model will try to predict the value of one or more outcomes.
- Regression: Regression models are used to predict a continuous value. Predicting prices of a house given the features of said house like size, price, etc. is one of the common examples of regression.
Classification is a specific sub-area of supervised learning and can be used to classify text, images/video, and sound. This makes classification suitable for different tasks across different problem statements. And it’s what we’ll be working with in our example.
Classification techniques predict discrete responses or categories, such as whether an email contains spam or not. The output of a classification model is a SPAM or NOT SPAM, for instance. Or, in the case of our previous example, the name of one of our friends.
The classes predicted by the model are the ones that it recognizes. You might have seen on your social media handles when Facebook starts predicting names of our friends while uploading a picture, or businesses analyzing sentiments of a Tweet—these are all examples of classification.
How do we create a good model?
The answer to this question depends on the data we have and what we aspire to predict with the help of the model. We might sometimes come across an existing, pre-trained model (or, put another way, pre-built) that suffices for our needs and does what we expect it to do. In such a scenario, all we need to do is convert the model to Core ML and use it inside our iOS app.
Core ML comes with a bunch of ready-to-use models that detect thousands of features and understand a thousand different classes of objects. Training such a model from scratch requires very large datasets and a huge amount of computation—both of which can be expensive in terms of time and money.
For those reasons (and given limited expertise), training our own models from scratch to do the same thing might not turn out to be a wise choice. Instead, we can take a pre-trained model and customize it on our own data— this process is called transfer learning.
Transfer learning saves a lot of time, effort, and resources, as it’s much faster when compared to training an entire model from scratch. We don’t need a huge dataset to use a pre-trained model, and in many cases, we can get by with a few thousand images instead of millions of images.
Apple provides two tools that perform transfer learning. These are:
In upcoming articles, we’ll learn how to create a binary image classifier and also discuss how ML works behind the scenes in iOS.
For other updates you can follow me on Twitter on my twitter handle @NavRudraSambyal
Thanks for reading, please share it if you found it useful 🙂