On-Device Video Subtitle Generation in SwiftUI

Making foreign language videos more accessible by combining video, SFSpeechRecognizer, and ML Kit’s Translation API

Amongst all the cool new tech releases and updates that happened last year, three in particular stood out to me:

  • SwiftUI — because I’m so over XIBs and Storyboards.
  • SFSpeechRecognizer providing on-device and unlimited usage of transcription
  • ML Kit’s Translation API also providing on-device and unlimited usage of language translation

These three advances inspired a few projects of mine, in which I’ve combined them in various ways. For instance, I’ve created a speech translator that uses zero data, even while traveling internationally.

As pointed out to me by a reader here on Medium, it only makes sense, then, for me to combine them all into one ‘last’ ultimate example: and on-device subtitle generator for video.

Why Live Subtitling?

I’ve made the case that on-device transcription itself is a huge win for the accessibility community. Likewise, on-device translation helps ‘flatten’ the world by closing language gaps/barriers and opening up communication opportunities.

The value in having live subtitles is that it opens up being able to watch videos (informative or not) from anywhere in the world. It serves to connect people to other communities and cultures. And, in the most recent case of the coronavirus (COVID-19), it could serve as a tool for consuming the latest global news and perspectives.

Let’s Get Started

Of course, you’ll need macOS and Xcode 11.x on hand.

We’ll also be working off a previous project of mine (the ancestor to this project, if you will)—SwiftUIClosedCaptioning. That project already handles attaching the SFSpeechRecognizer to a video’s audio to generate a transcription and print to our UI (read: closed captioning). For details on how I implemented this, check out the article I wrote here.

SpeechTranslatorSwiftUI will be our guide on how to attach ML Kit’s Translation to our closed captioning and produce subtitles. For more details on this project, check out this article here:

Attaching ML Kit to Our CC

This is going to be quite simple since so many of the pieces have been put into place already. The first step is setting up our app with Firebase. To do that, read the how-to on Firebase’s site. The only exception will be, that when you set up your Podfile, you can just go with these:

From here, we already have a project that’s setup to play a video, capture the audio into a buffer stream, and then have that stream analyzed to produce a transcript. All we need to do is ensure we have a language model downloaded (again, I’m using Italian as a default example) and then take our transcript and run it through our translator. All of this in ClosedCaptioning.swift.

First, we’ll expand our init to check for the language model:

Then we’ll rework the completion in our recognitionTask to take the transcription results and pass them into our translator. And since we’re only interested in displaying our translated subtitles, we just set our Published caption object to the translation results. This will automatically update our Text() View that binds with caption:

Run!

That’s it! Now we run our app and see it in action!

Where to Go From Here

Now that we have a basic video player with working subtitles, there’s so much more we can add from here:

  • Language selector with the ability to download the appropriate language model
  • Provide both CC and subtitles, if desired
  • Swap out where we source our videos from (including setting up for streaming)

In wrapping up this series on transcripts, translations, and SwiftUI (for now), my hope and desire is to see these projects inspire developers, like yourself, to build more polished and production-ready apps that can make a difference. Please feel free to do so and, of course, let me know! I would love to hear stories of what people create and how they’ve impacted others!

Avatar photo

Fritz

Our team has been at the forefront of Artificial Intelligence and Machine Learning research for more than 15 years and we're using our collective intelligence to help others learn, understand and grow using these new technologies in ethical and sustainable ways.

Comments 0 Responses

Leave a Reply

Your email address will not be published. Required fields are marked *