Identify the language of text with ML Kit on Android

Identify the language of user-provided text using ML Kit

Using ML Kit, the language of a string of text can be determined. The ML Kit API supports over 100 languages. It can also identify native and romanized text from languages such as Russian and Arabic. Given a string of text, the API provides the most likely languages as well as the confidence level. Let’s look at how that can be done.

Getting Started

Start by adding the dependencies for the ML Kit Android libraries. Add the following in your app/build.gradle file.

dependencies {
  // ...

  implementation 'com.google.mlkit:language-id:16.1.1'
}

The App Elements

The application is made of a text input, a text view, and a button. The text input will be used to collect a string from the user. The button will invoke the inferencing process while the text view will display the result.

<?xml version="1.0" encoding="utf-8"?>
<androidx.constraintlayout.widget.ConstraintLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    tools:context=".MainActivity">

    <EditText
        android:id="@+id/editText"
        android:layout_width="229dp"
        android:layout_height="56dp"
        android:layout_marginStart="91dp"
        android:layout_marginTop="100dp"
        android:layout_marginEnd="91dp"
        android:layout_marginBottom="196dp"
        android:ems="10"
        android:hint="Sentence"
        android:inputType="textPersonName"
        app:layout_constraintBottom_toTopOf="@+id/button"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toTopOf="parent" />

    <TextView
        android:id="@+id/textView"
        android:layout_width="240dp"
        android:layout_height="70dp"
        android:layout_marginStart="91dp"
        android:layout_marginTop="36dp"
        android:layout_marginEnd="80dp"
        android:layout_marginBottom="199dp"
        android:text="Language:"
        app:layout_constraintBottom_toBottomOf="parent"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toBottomOf="@+id/button" />

    <Button
        android:id="@+id/button"
        android:layout_width="201dp"
        android:layout_height="71dp"
        android:layout_marginStart="91dp"
        android:layout_marginTop="60dp"
        android:layout_marginEnd="119dp"
        android:onClick="predictLanguage"
        android:text="Click"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toBottomOf="@+id/editText" />
</androidx.constraintlayout.widget.ConstraintLayout>

Create LanguageIdentifier Instance

The first step is to create an instance of the LanguageIdentifier class. A threshold of 50% is defined in the instance. Usually ML Kit will show all possible languages as long as they meet a threshold of at least 0.01. Specifying 50% ensures that only the language with that confidence is returned. If no language is identified, the und (undetermined) code will be returned.

String text = String.valueOf(editText.getText());

LanguageIdentifier languageIdentifier = LanguageIdentification.getClient(
               new LanguageIdentificationOptions.Builder()
                       .setConfidenceThreshold(0.50f)
                       .build());

Obtain Possible Languages

Getting the possible languages is done by the identifyPossibleLanguages method. Once the language is obtained, its code and confidence level are appended to the text view.

 languageIdentifier.identifyPossibleLanguages(text)
               .addOnSuccessListener(new OnSuccessListener<List<IdentifiedLanguage>>() {
                   @Override
                   public void onSuccess(List<IdentifiedLanguage> identifiedLanguages) {
                       for (IdentifiedLanguage identifiedLanguage : identifiedLanguages) {
                           String language = identifiedLanguage.getLanguageTag();
                           float confidence = identifiedLanguage.getConfidence();
                           textView.append(" "+ language + " "+ confidence );
                           Log.i("TAG", language + " (" + confidence + ")");
                       }
                   }
               })
               .addOnFailureListener(
                       new OnFailureListener() {
                           @Override
                           public void onFailure(@NonNull Exception e) {
                               // Model couldn’t be loaded or other internal error.
                               // ...
                           }
                       });

Conclusion

As illustrated here, the process of using Google’s ML Kit to detect the language of a text is quite straightforward. You can find the entire source code for this piece below.

Avatar photo

Fritz

Our team has been at the forefront of Artificial Intelligence and Machine Learning research for more than 15 years and we're using our collective intelligence to help others learn, understand and grow using these new technologies in ethical and sustainable ways.

Comments 0 Responses

Leave a Reply

Your email address will not be published. Required fields are marked *