Federated Learning: A Practical Guide to Decentralised Machine Learning

There are over 5 billion mobile device users all over the world. These users generate massive amounts of data—through cameras, microphones, and sensors like accelerometers—which can be leveraged for building intelligent applications.

So, what is federated learning?
Federated learning is a method for training AI models directly on users’ devices, without moving the raw data to a central server.

The model learns locally, sends only updates back to the cloud, and keeps your personal data private.

Traditionally, data is collected in centralised data centres to train machine learning and deep learning models.

However, due to rising concerns over data privacy and bandwidth limitations, centralised learning isn’t always practical.

Users are much less likely to share personal data, which means valuable training data remains stuck on devices.

The outline of the article is as follows:

Data is Available Everywhere
What is Federated Learning?
Steps for Federated Learning
Properties of Problems Solved using Federated Learning
Federated Averaging Algorithm

Let’s get started.

What Is Federated Learning?

Federated learning is a decentralised machine learning approach introduced by Google in 2016 through their research paper Communication-Efficient Learning of Deep Networks from Decentralized Data.

In traditional machine learning setups, data is centralised—collected from user devices and uploaded to a server where a model is trained. This model is then distributed back to devices. This centralised pipeline works, but it creates problems around:

Privacy
Scalability
Bandwidth usage
Data ownership and compliance

Federated learning turns that model on its head.

In federated learning:

A shared global model is distributed to devices (called clients).
Each client trains the model locally using their private data.
The clients send only the model updates (not the data) back to the server.
The server aggregates these updates to improve the global model.

This method allows data to remain on-device, satisfying privacy requirements while still enabling the creation of powerful models.

The main advantages include:

Data never leaves the device
Compliance with GDPR and data regulations
Reduced server load and network bandwidth
Personalised model updates tailored to individual device behaviour

This makes federated learning ideal for mobile phones, wearables, medical imaging, banking apps, and more.

Steps for Federated Learning

Federated learning consists of a straightforward loop repeated until the global model converges. The core process can be broken down into the following steps:

A global model is trained on the server
A subset of clients (devices) is selected
Clients receive the model and train locally on their data
Clients send updated weights back to the server
The server aggregates all client updates
The global model is updated and redistributed
The cycle repeats

Here’s a breakdown of each step:

Global Model Initialisation: A base model is created server-side, possibly using public or synthetic data.
Client Selection: Only a subset of all clients is chosen in each round to optimise efficiency and resource usage.
Local Training: Each selected client fine-tunes the model on their private data using an optimisation algorithm (e.g. stochastic gradient descent).
Weight Updates: Clients transmit updated model weights, not the data itself.
Aggregation: The server aggregates updates using algorithms like federated averaging.
Model Update: The aggregated model is distributed again, and the loop continues.

Here’s a simplified view:

Step	Description
1	Send global model to devices
2	Train model locally on private data
3	Send only updates (not raw data)
4	Server aggregates updates
5	Updated model is redistributed

This workflow ensures a balance between privacy and accuracy.

You can also watch this video from Google that summarizes the definition of federated learning.

Data Is Available Everywhere

In the data-driven era we live in, building intelligent applications depends on one key resource—data. The good news is that data is being generated constantly, in real-time, and in large volumes. The bad news is that most of it is difficult to access.

Today, mobile phones, smartwatches, smart home devices, and other sensor-laden IoT products are primary data sources. These devices are always-on, frequently used, and closely tied to individual behaviour, making them ideal platforms for data collection.

According to the GSMA Mobile Economy report, the number of mobile users reached 5.2 billion in 2019 and is projected to increase to 5.8 billion by 2025. Among them, 3.8 billion are connected to the internet, further amplifying the scale of data being produced.

To put this in perspective:

Year	Mobile Users (Billions)	Internet-Connected Users (Billions)
2019	5.2	3.8
2025	5.8 (projected)	>4.5 (estimated)

The same report also indicates the presence of 12 billion IoT devices in 2019, which is expected to double to 24.6 billion by 2025. These devices add another layer of continuous data generation.

According to the Pew Research Center, the majority of these mobile devices are smartphones. They gather and process contextual information such as location, motion, and environmental sound—offering vast potential for training highly personalised machine learning models.

All of this highlights a critical reality: data is abundant, but accessing it safely and legally is becoming increasingly difficult.

The existence of such large numbers of data generators means data is indeed available everywhere. Each click by a mobile user adds more data about what interests the user and thus can be used to build intelligent applications with better UXs.

To make use of the users’ private data without revealing their privacy, federated learning comes into action.

Properties of Problems Solved Using Federated Learning

Federated learning isn’t ideal for every problem. Google’s research paper highlights three key properties of tasks that benefit most from this approach.

1. Data Is Heterogeneous and Personalised

Different users use devices differently. A generic model trained on pooled data may not reflect the unique behaviour of each user. With federated learning:

The model can be personalised locally to each user
The training data stays relevant and contextual
Users experience better model performance on their device

This is especially useful for predictive keyboards, voice assistants, and personal recommendation systems.

2. Uploading Data Isn’t Feasible

Asking users to upload large datasets to a central server can be:

Costly (due to bandwidth and data plans)
Invasive (due to privacy concerns)
Technically difficult (due to connectivity or device limitations)

Federated learning avoids all of these issues by ensuring training happens on-device.

3. Training Requires Significant Resources

Mobile devices have limited compute power and battery life. To handle this, frameworks like TensorFlow Federated allow for:

Lightweight training processes
Background training (only when devices are idle, charging, and on Wi-Fi)
Energy-efficient computation

This makes federated learning a scalable and device-friendly solution.

Real-World Applications

Federated learning is already being used by top tech companies in real-world products.

Google Gboard

Gboard uses federated learning to personalise next-word prediction
Server traffic reduced by 45%
User privacy is preserved since typing data never leaves the device

Apple Siri

Apple uses on-device training to personalise voice recognition
Raw voice data is never uploaded
Local learning adapts to user speech patterns over time

Healthcare

Hospitals use federated learning to train diagnostic models without sharing sensitive patient data
A Mayo Clinic + NVIDIA pilot increased medical imaging AI accuracy by 20–30%

Banking

Banks apply FL to detect fraud across decentralised data sources
Capital One reported a 12% improvement in fraud detection rates using FL

Benefits and Limitations

Advantages

Privacy-first: Raw data never leaves the device
Compliance-friendly: Supports GDPR and other laws
Bandwidth-efficient: Only model updates are sent
Personalised: Adapts to user-specific data

Challenges

Slower convergence: Training can take longer
Heterogeneous hardware: Devices vary in power and availability
Security: Still vulnerable to poisoning or inversion attacks
Debugging: Harder to trace errors across decentralised systems

Here’s a comparison:

Feature	Centralised ML	Federated Learning
Data Privacy	Low	High
Bandwidth	High	Low
Model Personalisation	Low	High
Compute Requirements	Central	Distributed
Training Speed	Fast	Slower
Regulation Compliance	Risky	Safe

What’s Next?

Federated learning is still evolving. As frameworks mature and devices become more powerful, we can expect to see broader adoption across:

Smart homes
Autonomous vehicles
Voice assistants
Wearables
Healthcare diagnostics

Frameworks like TensorFlow Federated, PySyft by OpenMined, and PyTorch FL are making it easier to implement FL in production.

Meanwhile, research continues to tackle its biggest challenges: improving training efficiency, boosting robustness against adversarial attacks, and simplifying debugging workflows.