Real-Time Breath Tracking via AirPods

Those of you that have used meditation or breathing apps before might be familiar with this situation: You’re supposed to follow a simple breathing pattern. Inhale for four seconds, hold, exhale for four seconds. Sounds easy, but it can actually be pretty hard.

Soon after starting the exercise your mind starts drifting and you forget about that deep breath. But while you’re distracted, the exercise in your app keeps going until you’ve added another day to your mindfulness streak.

So let’s set out to build an iOS app that can tell whether you’re breathing or not. To achieve this we’ll have the app listen to our breath in real-time via the microphone of our AirPods.

In the first step we’ll look at how to access the microphone. The audio stream will then be recorded on our phone and processed to train in a Sound Classifier model in CreateML. Finally, the audio is used for real-time inference on the phone.

Accessing the Microphone of your AirPods

Go ahead and open a new SwiftUI App in Xcode.

In a new file we create a class called AudioAnalyzer which will be used for all the features of our app. An AVAudioEngine is needed to tap our AirPods and AVAudioSession allows us to set the general audio configuration in our app.

import AVFoundation
class AudioAnalyzer: NSObject { 
    var audioEngine: AVAudioEngine
    var audioSession: AVAudioSession
}

During initialization of our AudioAnalyzer class, we instantiate the AVAudioEngine and allow recording as well as Bluetooth input.

override init() {
    audioEngine = AVAudioEngine()
    audioSession = AVAudioSession.sharedInstance()
    do {
        try audioSession.setCategory(.playAndRecord, 
        mode: .measurement, options: [.allowBluetooth])
        try audioSession.setActive(true)
    }
    catch {
    }
}

The AVAudioEngine is a powerful audio processing tool, but our setup is very simple. We only use the three default nodes:

inputNode
mainMixerNode
outputNode

In a new function startAudioEngine() we connect the nodes, place the tap and start the AVAudioEngine. This function will later be called from a play button in our UI.

func startAudioEngine() {
    audioEngine.reset()
    audioEngine.connect(audioEngine.inputNode, 
        to: audioEngine.mainMixerNode, 
        format: audioEngine.inputNode.outputFormat(forBus: 0))
    audioEngine.connect(audioEngine.mainMixerNode, 
        to: audioEngine.outputNode, 
        format: audioEngine.mainMixerNode.outputFormat(forBus: 0))
    tapMicrophoneForRecording()
    try! audioEngine.start()
}

The tap is installed on the output stream of the inputNode which will the microphone of our AirPods. That’s how easy it is to access the audio buffer.

func tapMicrophoneForRecording() {
    audioEngine.inputNode.removeTap(onBus: 0)
    audioEngine.inputNode.installTap(onBus: 0, 
                                   bufferSize:1024, 
                                   format:audioEngine.inputNode.outputFormat(forBus: 0)) { buffer, time in
  }
}

One important note at this point is that the raw audio will have the audio format of the inputNode. AirPods use a sample rate of 16kHz for the microphone.

Recording the Breath and Training a CoreML Model

Now that we’ve got the audio buffer we need to save it to our phone. So let’s extend the previous function. We create a URL for the file within the document directory and write the buffer to it.

func tapMicrophoneForRecording() {
    let fileName = "recorded_breath.caf"
    let documentsPath = NSSearchPathForDirectoriesInDomains(.documentDirectory, .userDomainMask, true)[0]
    let fileURL = URL(fileURLWithPath: documentsPath).appendingPathComponent(fileName)
    var recordingFile = try! AVAudioFile(forWriting: fileURL, 
                                             settings: audioEngine.inputNode.outputFormat(forBus: 0).settings)
    audioEngine.inputNode.removeTap(onBus: 0)
    audioEngine.inputNode.installTap(onBus: 0, 
                                         bufferSize:1024, 
                                         format:audioEngine.inputNode.outputFormat(forBus: 0)) { buffer, time in
        do {
                try recordingFile.write(from: buffer)
        } catch {
        }
    }
}

Our function to start the AVAudioEngine is now complete and we build a second function to stop it.

func stopAudioEngine() {
    audioEngine.stop()
}

That’s all the code it takes to make our first recording. Create an instance of the AudioAnalyzer class in ContentView. The UI will be as simple as two buttons to start and stop the recording.

struct ContentView: View { 
    @StateObject var audioAnalyzer = AudioAnalyzer()
    var body: some View {
        HStack {
            Button(action: {audioAnalyzer.stopAudioEngine()}, label: {
                Image(systemName: "square.fill")
                    .font(.largeTitle)
            })
            Button(action: {audioAnalyzer.startAudioEngine()}, label: {
                Image(systemName: "play.fill")
                    .font(.largeTitle)
            })
        }
    }
}

One last thing — a microphone usage description needs to be added to our plist.

We are good to go. Run the app on a physical device and connect your AirPods. Also make sure to be in a quiet environment. Tap the play button and you’ll hear the creepy sound of your own breath while the app is recording.

Keep going for a few minutes. This is the data gathering for our CoreML model and the more data we get, the better the predictions will be.

To get to the file we open the “Devices and Simulators” window in Xcode and download the container of our app. The container can be opened via “Show Package Contents” and the recording is saved in its “Documents” folder.

The file can now be imported into an audio editing software. Audacity is a great open-source option for this.

The waveform of your file will look similar to this:

Instead of the waveform, we can also take a look at the spectrogram view. This shows the intensity of different frequencies and makes it easier to tell breathing sounds from background noise.

Next, we need to cut the recording into separate “breathing” and “not breathing” files. The duration of the files needs to be >0.975 sec. This work is a little tedious, but data preprocessing is often not the most fun part of a ML project. You should end up with something like the graph below. The diagram on the top shows a “breathing” file while the bottom is a “not breathing” file.

Now copy the files into two separate subfolders. That’s the required format for Create ML training data.

You might be wondering why we didn’t just use the “Voice Memos” app for the recording. Good point. Earlier we set the mode of the AVAudioSession to .measurement. According to Apple, this will give you the rawest audio possible. Recordings from the “Voice Memos” app for example have high frequencies removed which could be useful features for our model. AirPods Pro will also lead to better recordings than normal AirPods.

Finally, we can start training a neural network with our data. Open a new document in CreateML and pick the Sound Classification template.

Drag the breathing files folder onto the training data section. I’m using a total of 551 files divided into the two classes, but even with fewer files it’s possible to achieve good results.

Press the play button and it won’t take long for CreateML to train the model. The two classes are pretty distinct from each other and the model achieves high accuracy for training and validation after 10 iterations.

We save our model and need to copy it into our XCode project folder. Now it’s ready to be deployed in our app.

Real-time Inference

In our AudioAnalyzer class we create a variable of the same type as our CoreML model. We also have an optional SNAudioStreamAnalyzer.

This object will take the sound buffer and process it through our model. It’s optional because the AirPods might not be connected when the app starts and we need the correct audio format. So we simply create it later when the play button is tapped.

Then we need variables for the prediction and the confidence. We’re adding the @Published property wrapper and the ObservableObject protocol to make sure our UI will be updated every time a new prediction is available. SNResultsObserving is required to receive the results of the classification.

class AudioAnalyzer: NSObject, SNResultsObserving, ObservableObject {
    var soundClassifier: BreathCheckerModel
    var analyzer: SNAudioStreamAnalyzer?
    
    @Published var prediction = ""
    @Published var confidence = 0

Going back to starting the AVAudioEngine, we replace the function tapMicrophoneForRecording() with one that analyzes the audio buffer instead of writing it to file. The SNAudioStreamAnalyzer is instantiated here and our pre-trained model is added via a SNClassifySoundRequest.

func tapMicrophoneForInference() {
        analyzer = SNAudioStreamAnalyzer(format: audioEngine.inputNode.outputFormat(forBus: 0))
        do {
            let request = try SNClassifySoundRequest(mlModel: soundClassifier.model)
            try analyzer!.add(request, withObserver: self)
        } catch {
            return
        }
        audioEngine.inputNode.removeTap(onBus: 0)
        audioEngine.inputNode.installTap(onBus: 0, bufferSize:1024, format:audioEngine.inputNode.outputFormat(forBus: 0)) { buffer, time in
            DispatchQueue.main.async {
                self.analyzer!.analyze(buffer, atAudioFramePosition: time.sampleTime)
            }
        }
    }

Every time the SNAudioStreamAnalyzer gets a new analysis result, the request() method will be called. This will make our class conform to SNResultsObserving. We grab the first classification result and assign it to our prediction and confidence variables.

func request(_ request: SNRequest, didProduce result: SNResult) {
    guard let result = result as? SNClassificationResult,
    let classification = result.classifications.first else { return }
        DispatchQueue.main.async {
            self.prediction = classification.identifier
            self.confidence = Int(classification.confidence * 100)
            }
    }

The last step is to update the UI in ContentView. Above our play and stop buttons, we add HStacks for prediction and confidence.

struct ContentView: View {
    
    @StateObject var audioAnalyzer = AudioAnalyzer()
    
    var body: some View {
        VStack(alignment: .leading) {
            HStack {
                Text("Prediction:")
                Text(audioAnalyzer.prediction)
            }
            HStack {
                Text("Confidence:")
                Text("(audioAnalyzer.confidence) %")
            }
            HStack {
                Button(action: {audioAnalyzer.stopAudioEngine()}, label: {
                    Image(systemName: "square.fill")
                        .font(.largeTitle)
                })
                Button(action: {audioAnalyzer.startAudioEngine()}, label: {
                    Image(systemName: "play.fill")
                        .font(.largeTitle)
                })
            }
        }
    }
}

Our app doesn’t look pretty, but we now get real-time predictions of our breathing. A new prediction is made every ~0.5 seconds and the accuracy still feels good, especially on AirPods Pro.

And if you don’t like to hear you own breath, you can silence the AVAudioEngine by adding the following to the startAudioEngine function.

audioEngine.mainMixerNode.outputVolume = 0

Conclusion

I hope you liked this mobile machine learning use-case. The full code can be found here:

You’re also welcome to check out “4–4 Focus” on the Apple AppStore to try the tracking in a breathing exercise. The SoundAnalysis framework works really great for any kind of sound prediction. Let me know which other application you come up with!

Real-Time Breath Tracking via AirPods

Accessing the Microphone of your AirPods

Recording the Breath and Training a CoreML Model

Real-time Inference

Conclusion

Fritz

Comments 0 Responses

Leave a Reply Cancel reply