Comparing iOS Text Recognition SDKs Using Delta

A month back I wrote a post that introduced an open-source package react-native-text-detector. In that tutorial, we built a simple Business Card app using the same package. Package used different libraries for detection on Android and iOS. It used Firebase’s ML Kit on Android and Tesseract OCR along with Core ML on iOS. The major reason for this is mentioned here.

Since the conflict between ML Kit and React Native on iOS was resolved, we now have to select one proper solution for achieving this task on both platforms. In order to make an informed decision, we had to analyze the performance of both of these libraries on both platforms.

Introducing Delta — Better Mobile ML Decisions

To do so, I’ll use Delta, a platform I’ve been working on that will help developers analyze different ML models against datasets to decide what models can be beneficial for their apps. Delta will allow developers to download models on the fly and run them on their device against available datasets. Delta will also present a comparative analysis between the performance of models. (Delta is still in the final stages of development; I’m hopeful for a beta launch in the next few weeks.)

For this analysis, we provided a dataset of 959 images to both Tesseract OCR & Firebase’s ML Kit to see which one performed better based on successful recognition, time consumption for recognition, and accuracy [Using Levenshtein distance for calculation].

A little background

Usually, optical character recognition (OCR) includes two steps: first, a step to detect bounding boxes that contain text; second, it interprets those bounding boxes as paragraphs, lines, and words.

In order to simplify this process, we’ve cropped all images to bounding boxes so the libraries can focus more on recognition and less on detection. We’ve also made sure that each image contains just a single word to make this process simpler. Some sample images:

We iterated over our dataset and pushed each of them to both of these libraries. RNTextDetector’s comparison branch exposes the same API for both of these libraries. If the text is detected in an image, it’s returned along with the time consumed during this detection.

Summary of results

ML Kit outperforms Tesseract OCR by a large margin, as witnessed above. Though it consumes more time on the CPU [that might be an issue for real-time detection], the significant margin of correction shadows this aspect.

Even in cases where it failed to recognize correctly, the correctness percentages were much better as compared to Tesseract OCR’s failures.

Diving a little deeper

In order to get a more complete picture of where both of these libraries perform better, we need to analyze the results of these libraries separately.

Success for both — Total 288

A total of 288 images were recognized perfectly by both of these libraries. For these matches, Tesseract OCR performs much quicker than Firebase’s ML Kit.

Failure for both— Total 207

207 images were detected with text, but both libraries were unable to recognize the text perfectly. ML Kit was better in this case, as it had a higher correctness rate.

Misery for both — Total 15

In 15 images, none of these libraries was able to detect the text in the image.

Firebase was better — Total 316

In 316 images, Firebase’s ML Kit performed perfectly while Tesseract OCR was unable to recognize the text correctly.

Tesseract took the lead — Total 29

In 29 images, Tesseract OCR performed perfectly while Firebase’s ML Kit was unable to recognize text correctly. Even though ML Kit didn’t perform very well, the correctness levels on a number of images were higher than we witnessed in the previous comparison.

Some other aspects

It’s not just about how well a library performed—at times there are parameters that might influence one’s decision of choosing one library over the other.

App Size

Though we’re living in an age where manufacturers’ focus is to make devices more AI-efficient, there are a lot of people who still have low-end devices that don’t have a lot of space. Firebase’s ML Kit adds more size to your app as compared to Tesseract OCR.

Flexibility

Image Modification: Tesseract OCR comes with a built-in feature for modifying images to make them easier and quicker to recognize. Whereas for Firebase’s ML Kit, no such tool is available for developers.
Custom Models: Tesseract OCR has its own set of models you can import into your project as per your requirements straight away. ML Kit, on the other hand, comes with its own pre-built models. However, if none of those meet your needs, you can use a custom TensorFlow model that requires a little bit of prior machine learning experience.

Security

Tesseract OCR demands set ENABLE_BITCODE to NO while archiving your app on iOS. This conflicts with many obfuscation and runtime protection tools. ML Kit doesn’t have any such requirements, making it more usable for financial and other security-oriented apps.

Who’s the winner?

It’s not just about how these libraries perform on one platform. Most of the time, developers are concerned about implementation on both Android and iOS. We performed similar analyses on Android using Delta and here are the results.

Before we declare a clear winner, we’ll take a closer look at performance details on Android soon.