Swifty ML: An Intro to Swift for TensorFlow

Taking a peek at what Google believes is the future language of machine learning.

When it comes to machine learning, Python has been dominant. However, we can already foresee how Python has a limit in terms of how far it can scale with modern ML demands.

Google seems to have considered this as it’s considered the future of ML (especially through the lens of TensorFlow). They realize that, in lieu of overhauling Python, a more modern and adaptable language could change the game.

Hence, Swift for TensorFlow (S4TF). All Apple vs. Google politics aside, it’s a bold statement by Google on Swift’s capabilities in the modern world amongst the programming language greats. To solidify their buy-in, the project, itself, is led by Papa Swift himself, Chris Lattner.

LeNet-5 and MNIST in Swift for TF

In this article, I’d like to walk you through training the LeNet-5 network on MNIST, both of which are already provided in the official S4TF swift-models repo. Our example will be a modified version of the one found in that repo’s examples.

For our environment, we’ll be using Google Colab. Think of it as a cloud-based, modified Jupyter Notebook environment. Using Colab is the fastest way to jump into S4TF without having to install or set up anything. Plus, it gives us free access to CPUs, GPUs, and even TPUs! (For more info, check out our article here).

Let’s begin!

Opening the Colab Notebook

If you haven’t done so already, go to http://colab.research.google.com/ and setup your account/login. Once you’ve made it to the welcome screen, go to the GitHub option, enter in dbolella/s4tf-lenet-mnist, and then select the lenet-mnist-swift-models.ipynb file (see the below screenshot).

This will be the notebook we work off of. To ensure we’re ready for S4TF, go to Runtime → Change runtime type. There you should see our Runtime type set to Swift and our Hardware accelerator set to GPU. Hit cancel or save to return to the notebook.

You’ll notice we have 4 code sections. As we go through each, hit the play button in the top left corner of the code snippet so our code runs as we go. Remember to hit them in the proper order. If you run into any problems, you can always go to Runtime → Restart Runtime to clear your progress and start over. (Don’t be afraid to restart if you need to, this model doesn’t take long to run compared to others!)

Installing and Importing Libraries

First, we need to pull in 2 libraries from swift-models: ImageClassificationModels and Datasets. If we’re running locally, we can use Swift Package Manager or manually pull in the classes we need.

On Colab, we can use %install to accomplish this. Since swift-models is set up as a Swift package, we call the repo as a package and provide the URL and branch we want to use (you can also set to version, instead). At the end of the line, we then declare the libraries we’re specifically looking for. Since they come from the same repo/package, we don’t need to have 2 separate lines (in fact, don’t do that. You’ll get an error).

Once complete, we then start our Swift code by importing the libraries we’ll be using (TensorFlow is already available on Colab, no need to install):

Model, Dataset, Optimizer… Oh My!

Next, we want to instantiate the dataset, model, and optimizer we’ll be using. First, we set our batchSize (in this case to 128), which will determine how much data we’ll train with at a time.

We then use batchSize as we set up our dataset, which we call simply withMNIST(batchSize: batchSize). This essentially pulls our data and massages it for use in our model (to look under the hood, check it out here).

Similarly, we set our model withLeNet(). If you’re familiar with how LeNet is setup in Python, then you’ll notice that the Swift version looks extremely similar. You can check it out here, but let’s take a look:

This model is built as a Swift struct that extends the Layer protocol, which represents a neural network layer. The protocol expects to take an input, pass it through the layer, and return the resulting output. This is enforced by having the struct conform to callAsFunction:

Next, we set our optimizer using SGD(for: model, learningRate: 0.1), which is a built-in stochastic gradient descent optimizer. Lastly, we set up our epochCount (the number of times we’ll have the model work through the dataset during training).

Benchmarking Prep

Before we start training our model, we want to be able to collect some benchmarking statistics so we know our model’s accuracy and if it’s actually improving over each epoch. We can simply use a struct to hold these values both in our training and testing sets.

The total loss…well, that’s rather self-explanatory. GuessCounts will be used to determine the train and test accuracies at the end of each epoch. To limit duplicating code in our training loop, I’ve included a simple helper function in our struct to collect and massage the GuessCount data.

Training Day

Lastly, we train our model! We run the training loop based on our epochCount. Inside we have 2 more loops: one for iteratively training through the training data and another to iteratively test against the testing data.

// The training loop.
for epoch in 1...epochCount {
    var trainStats = Statistics()
    var testStats = Statistics()
    Context.local.learningPhase = .training
    for i in 0 ..< dataset.trainingSize / batchSize {
        let images = dataset.trainingImages.minibatch(at: i, batchSize: batchSize)
        let labels = dataset.trainingLabels.minibatch(at: i, batchSize: batchSize)
        // Compute the gradient with respect to the model.
        let (loss, gradients) = valueWithGradient(at: model) { model -> Tensor<Float> in
            let logits = model(images)
            trainStats.updateGuessCounts(logits: logits, labels: labels, batchSize: batchSize)
            return softmaxCrossEntropy(logits: logits, labels: labels)
        trainStats.totalLoss += loss.scalarized()
        optimizer.update(&model, along: gradients)

    Context.local.learningPhase = .inference
    for i in 0 ..< dataset.testSize / batchSize {
        let images = dataset.testImages.minibatch(at: i, batchSize: batchSize)
        let labels = dataset.testLabels.minibatch(at: i, batchSize: batchSize)
        // Compute loss on test set
        let logits = model(images)
        testStats.updateGuessCounts(logits: logits, labels: labels, batchSize: batchSize)
        let loss = softmaxCrossEntropy(logits: logits, labels: labels)
        testStats.totalLoss += loss.scalarized()

    let trainAccuracy = Float(trainStats.correctGuessCount) / Float(trainStats.totalGuessCount)
    let testAccuracy = Float(testStats.correctGuessCount) / Float(testStats.totalGuessCount)
          [Epoch (epoch)] 
          Training Loss: (trainStats.totalLoss), 
          Training Accuracy: (trainStats.correctGuessCount)/(trainStats.totalGuessCount) 
          Test Loss: (testStats.totalLoss), 
          Test Accuracy: (testStats.correctGuessCount)/(testStats.totalGuessCount) 

In the training loop, we run the training data through the model, but do so inside valueWithGradient. This will allow our results to be returned as the value and the gradient, separately. You’ll also notice that we update our benchmarking statistics at the end of each pass.

The testing loop is very similar, with the exception of running our model inside of valueWithGradient, since we don’t need to separate out the gradient for test benchmarks.

At the end of each epoch, we print out our benchmark data. As our model continues to train, we should see our loss decrease and our accuracy increase with each epoch.


By now, my hope is that Colab has gotten through a few epochs and we can already see our model training improving. Congratulations, you’ve just trained a model using Swift for TensorFlow! (Check out my benchmarks below):

This was, without a doubt, a basic example that shows how similar it is to set up and train a model in Swift as it is in Python. We barely touched on the advantages of using the Swift language, including the built-in Python Interop, which gives us access to many libraries and functions we’ve come to know and love in the ML community. But that’s for another time…

Avatar photo


Our team has been at the forefront of Artificial Intelligence and Machine Learning research for more than 15 years and we're using our collective intelligence to help others learn, understand and grow using these new technologies in ethical and sustainable ways.

Comments 0 Responses

Leave a Reply

Your email address will not be published. Required fields are marked *