Creating an offline translation Android app using Firebase ML Kit

This tutorial is the 9th part in the series, ML Kit for Mobile Developers. If you’re not quite caught up, you can start here:

Series Pit Stops

Creating a Google Lens clone using Firebase ML Kit
Creating a Credit Card Scanner using Firebase ML Kit
Creating a Barcode Scanner using Firebase ML Kit
Identifying Places in a provided Image using Firebase ML Kit
Building a “Pokédex” clone using Firebase ML Kit and TensorFlow Lite
Recreating Google Translate using Firebase’s ML Kit and Cloud Translate API
Building a real-time object detection app using Firebase ML Kit
Creating an offline translation app using Firebase ML Kit (You are here)
Blink detection in Android using Firebase ML Kit

Introducing ML Kit’s Offline Translation API

Earlier this year at Google I/O, the team behind Firebase ML Kit announced the addition of 2 new APIs into their arsenal: object detection and an on-device translation.

While the article written earlier focuses on the object detection API, this article focuses on language translation. Specifically, we’ll look into how we can translate text from one language to other in real-time using this API without using any network connectivity!

Yes, this API uses on-device machine learning to perform language translation, which is a step up from the process outlined in an earlier blog here.

An intuitive use-case that employs this API would be an app that allows people speaking different languages to communicate. Using this API, you can translate a given message into English or another language of the user’s choice.

An example would be AirBnB, which does something very similar and auto translates sent and received messages when your host is not a native English speaker.

For this tutorial, we’ll be making an app that detects the text from an image and then proceeds to translate that text into English.

Take a look at a couple of screenshots from the app that we’ll be making:

This API also runs locally without network connectivity, so there will be little to no latency with the results while using this API.

Getting Started

This app will be using the following ML Kit APIs :

Text Detection API : To extract text from the provided image
Language ID API : To identify the language of the text extracted
Translation API : To translate the text into English (or any other language)

Step 1 : Set up Firebase in your project and add the required dependencies

This is a simple one — set up Firebase in your project. You can find a good tutorial here. In order to build this app, you also need to add the following dependencies:

dependencies {
  implementation 'com.google.firebase:firebase-ml-natural-language:19.0.0'  
  implementation 'com.google.firebase:firebase-ml-natural-language-language-id-model:19.0.0'  //For detecting the language of text
  implementation 'com.google.firebase:firebase-ml-natural-language-translate:19.0.1'  //For translating text
  implementation 'com.google.firebase:firebase-ml-vision:20.0.0'  //For detecting text from Image
}

You might also want to add a camera library to your project in order to integrate camera features in your app easily. I personally recommend the following:

Step 2: Creating a basic layout and adding the camera preview

We need to create a basic layout in our app that hosts a camera preview, along with a simple textView that will show the translated text.

<androidx.coordinatorlayout.widget.CoordinatorLayout
        xmlns:android="http://schemas.android.com/apk/res/android"
        xmlns:tools="http://schemas.android.com/tools"
        xmlns:app="http://schemas.android.com/apk/res-auto"
        android:layout_width="match_parent"
        android:layout_height="match_parent"
        tools:context=".MainActivity">

    <com.otaliastudios.cameraview.CameraView
            android:layout_width="match_parent"
            android:layout_height="match_parent"
            android:id="@+id/cameraView"/>

    <TextView
            android:text="Please scan some text to proceed"
            android:textSize="16sp"
            android:padding="8dp"
            android:id="@+id/tvTranslatedText"
            android:layout_width="match_parent"
            android:layout_height="wrap_content"/>

</androidx.coordinatorlayout.widget.CoordinatorLayout>

Step 3 : Add relevant listeners to the app

Next up, we need to add some listeners to our app that’ll capture a picture so that we can pass that image on to Firebase.

class MainActivity : AppCompatActivity() {

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)
        cameraView.setLifecycleOwner(this)

        textView.setOnClickListener {
                cameraView.takePicture()
        }

        cameraView.addCameraListener(object : CameraListener() {

            override fun onPictureTaken(result: PictureResult) {
                super.onPictureTaken(result)
                result.toBitmap { bitmap ->
                    extractTextFromImage(FirebaseVisionImage.fromBitmap(bitmap!!)) { text ->
                        extractLanguageFromText(text) { lang ->
                            translateTextToEnglish(lang, text) {
                                tvTranslatedText.text = it
                            }
                        }
                    }
                }
            }
        })
    }
    
    //Extract text from the provided image
    private fun extractTextFromImage(image: FirebaseVisionImage, callback: (String) -> Unit) {}
    
    //Extract language from the provided text
    private fun extractLanguageFromText(input: String, cb: (String) -> Unit) {}
    
    //Translate provided text to english
    private fun translateTextToEnglish(lang: String, text: String, callback: (String) -> Unit) {}
    
 }

Extracting text from the input image

In this step, we’ll be taking a look at one of the APIs I’ve covered earlier; the Firebase Image-to-Text API that extracts the text present in an image the user has captured.

The code looks very similar to what we did earlier:

private fun extractTextFromImage(image: FirebaseVisionImage, callback: (String) -> Unit) {
        
    //Get access to the cloud (or local) text detector
    val textDetector = FirebaseVision.getInstance().cloudTextRecognizer

    textDetector.processImage(image)
        .addOnSuccessListener { 
            // Detected text
            Log.d("TEXT", it.text)
            callback(it.text)
        }
        .addOnFailureListener {
            callback("No text found")
            it.printStackTrace()
        }
}

Determining the language of the extracted text

This step consists of taking the extracted text above and finding the source language that it’s written in.

This step is again very similar to what we did in the last blog on using the Firebase Language ID API.

private fun extractLanguageFromText(input: String, cb: (String) -> Unit) {
    
    // Get access to an instance of the language ID api
    val languageId = FirebaseNaturalLanguage.getInstance().languageIdentification
  
    languageId.identifyLanguage(input)
        .addOnSuccessListener {
            // The detected language
            Log.d("LANGUAGE", it)
            cb(it)
        }
        .addOnFailureListener {
            // If no language was detected, fallback to english
            cb("en")
            it.printStackTrace()
        }
}

Translating the input text to English

Next, we’ll take the output of the two steps above and feed them to the Firebase Translation API.

private fun translateTextToEnglish(lang: String, text: String, callback: (String) -> Unit) {

        val options = FirebaseTranslatorOptions.Builder()
            // Set the source language,  the language of the source material
            .setSourceLanguage(FirebaseTranslateLanguage.languageForLanguageCode(lang) ?: FirebaseTranslateLanguage.EN)
            // Set the target language, the language in which you want the translated text to be
            .setTargetLanguage(FirebaseTranslateLanguage.EN)
            .build()
        val translator = FirebaseNaturalLanguage.getInstance().getTranslator(options)

        translator.downloadModelIfNeeded()
            .addOnSuccessListener {
                // If the model downloads, translate the text
                translator.translate(text)
                    .addOnSuccessListener {
                        // it here is the translated text
                        callback(it)
                    }
                    .addOnFailureListener {
                        callback("Failed to translate")
                        it.printStackTrace()
                    }
            }
            .addOnFailureListener {
                it.printStackTrace()
            }

    }

As you can see, there are 3 parts to this process:

Selecting the source and target languages
Downloading the model for the source-target combination if it doesn’t already exist
Translating the text once the model is downloaded.

And that’s it!

With these 3 easy steps, we can now build a simple yet effective app that:

Identifies text from an image
Determines the language of that text
And translates that text into a language of our choice (here English).

If you want to play around with the app shown in the screenshots, you can build it from the GitHub repository linked below, and it should work well after adding it to your own Firebase project.

Thanks for reading! If you enjoyed this story, please click the 👏 button and share to help others find it! Feel free to leave a comment 💬 below.

Have feedback? Let’s connect on Twitter.