Machine Learning Techniques for Predicting Customer Loyalty

Many organizations are leveraging machine learning to analyze large customer databases and identify customer loyalty; or, perhaps more importantly, which customers are at the highest risk of churning.

Accurate prediction of churn is extremely valuable and, if the right steps are taken to retain customers at risk, businesses can lift LTV across an entire portfolio.

Using Machine Learning to Predict Churn: 7 Common Techniques
Which Technique Wins Out?

Using Machine Learning to Predict Churn: 7 Common Techniques

Below, you’ll find an overview of the most common techniques used in predicting churn.

1. Support Vector Machines (SVM)

SVM is a supervised learning method that analyzes a dataset with n features and tries to classify instances into one of two groups (binary classification). For example, the two groups for this use case would be: customers about to churn, and customers not about to churn. It creates an n-dimensional space, with each instance represented as a point in that space.

The algorithm tries to find a “hyperplane”, a mathematical construct within that multidimensional space that cuts through those points, separating them and leaving some points on one side of the hyperplane, and others on the other side, with as large a margin as possible.

In the telecom industry, for example, customer data features can be things like type of plan, minutes used, data used, number of customer service calls, number of customer service emails sent, etc. The SVM model can generate a prediction for each data point and predict whether the customer is in the “likely to churn” group or not.

2. Decision Trees

A decision tree is a model that creates a tree-like structure representing a sequential decision making process. Features in the dataset are represented as internal nodes in the tree, and there are branches that connect nodes to each other.

At the end of the decision tree there are “leaf branches”, which are prediction classes. Decision tree algorithms are highly flexible, and can represent both categorical and continuous classes.

In our case, categorical labels might be “high churn risk”, “medium churn risk”, “low churn risk”, or a continuous variable, such as the probability of churn risk between 0 and 1.

3. Naive Bayes Algorithm

The Naive Bayes classification technique classifies events based on previous data related to those events. It assumes independence of features or variables, assuming there is no interaction between different features. In our case, that means each data feature of a customer contributes to the probability of churn, with no relation to the other features.

Naive Bayes can work with multiple categorical labels. Its output is a probability that a specific instance belongs to each of the classes.

For example, Naive Bayes can analyze prior data and predict that a specific customer has a 10% chance of being in the “low churn risk” group, a 20% chance of being in the “medium risk” group, and a 70% chance of being in the “high risk” group.

4. Regression Analysis

Regression is a statistical technique that preceded machine learning, but it’s also used as part of many machine learning analyses. Logistical regression creates a model that explains how a set of independent variables (causes) contribute to a binary dependent variable (effect).

Applied to the problem of customer churn, logistical regression defines an equation with variables or features that are thought to impact churn, and tries to predict the best coefficient for each variable for a set of instances with a known result (for example, customers who did or did not churn after a set period of time). If the coefficients are statistically significant, the same equation can be used to predict churn for a customer with an as-yet unknown outcome.

5. Instance-Based Learning

The most popular instance-based learning technique is the famous K-nearest neighbors (KNN) algorithm. KNN is an unsupervised learning technique, meaning there is no model that’s supposed to predict the churn outcome.

Instead, it constructs a feature space and reaches a classification decision by counting the number of votes each instance receives from its nearest neighbors. Essentially, it looks at the instances in the feature space and tries to identify groups of instances that are “near” each other.

To predict churn, KNN can be applied to customer data by grouping together customers that seem to have similar features.

If the features are strongly focused on churn — for example, number of customer support cases, number of previous purchases, dollar value of purchases — this can be highly effective at identifying groups of customers who are at high risk of churn.

6. Ensemble-Based Learning

A common ensemble learning technique is a random forest, which is an extension of the decision tree algorithm. It creates a large number of randomized decision trees and takes a majority vote of the decisions reached by all the trees.

This is a technique that helps avoid overfitting—i.e. model predictions that are correct for the training data but not for actual unseen data.

Random forests can help predict the most likely feature paths that lead to customer churn — combinations of features that place a customer at high risk.

7. Artificial Neural Networks

An artificial neural network, also known as a multi-layer perceptron, accepts a set of inputs and passes them through multiple layers of neurons. Each neuron is very simple, merely taking the input, multiplying it by a weight between 0 and 1, and passing it on to the next layer of neurons.

Because many neurons are involved (up to millions in large neural networks), they can combine to perform very complex learning processes on the data.

The network performs “backpropagation”, going back through the network to see which is the optimal weight for each of the neurons and which will provide a prediction closest to the correct result.

Neural networks can receive a set of data points for a customer and run the data through multiple layers of neurons, and the output is a probability for each of several result labels.

The advantage here is, unlike other machine learning techniques, deep learning can learn the structure of the data and perform a type of cognitive learning process to “understand” which types of customers are more likely to churn, and which are not.

Which Technique Wins Out?

A recent study published in the Internal Journal of Advanced Computer Science and Applications (Sabbeh, 2018) compared the performance of common machine learning techniques when applied to the problem of customer churn.

The study examined a database of customer data from a telecom company, with 17 explanatory features including time the account has been active, international plan, voicemail usage, number of minutes and calls during day, evening and night, international calls, and customer service calls, among others.

The results are shown below:

The best ML algorithms for predicting customer churn

The study found that random forest and ADABoost, another type of ensemble learning technique, are the most effective algorithms, with an accuracy of between 93–96% on the telecom customer dataset. Neural networks (multi-layer-perceptron) were a close second with between 93–94% accuracy.

The worst ML algorithms for predicting customer churn

The worst algorithms are Naive Bayes (86–88% accuracy), logistic regression (84–87%), and linear discriminant analysis (LDA) (not covered in our list of algorithms above), with accuracy of only 83–86%.