Hello Transcribe 2.2 — now with CoreML 🚀

13 Apr 2023

Introduction

tl;dr Hello Transcribe version 2.2.x now uses CoreML and that makes it 3x-7x faster. It’s available on the App Store

Georgi Gerganov (and others) have added CoreML to Whisper.cpp and it’s a game-changer for speed.

If you want to try it out in Whisper.cpp there’s a “coreml” branch, and Pull Request with instructions. It hasn’t been merged into main at this point, but it’s stable as far as I can tell.

I won’t go into too many technical details as I don’t have the depth of understanding, but essentially the encoder part of the transformer runs in CoreML, and the decoder and other supporting functions (e.g. the mel spectrogram) runs on the CPU.

The result in the app is 3x - 7x speed-up for Hello Transcribe.

One thing I noticed during development is that the automatic language detection will result in the encoder being run twice, so if you specify the language in options when using a multilingual model you will see a performance improvement.

I also realised that setting the language to “auto” when using the English model will have an unnecessary performance penalty for the same reason. For 2.2 the language is set to “English” when using an English model.

There are two caveats:

  1. Users need to download the new models which include the compiled CoreML model.
  2. The first time the CoreML model is loaded it is optimized for the device. This may take a long time (up to a minute on my older iPhone 12 Pro), but then it is quick in future.

Benchmarks

Below are some informal benchmarks, running on my iPhone 14 Pro. For English it’s > 7x (in part because I removed the redundant call to the encoder as mentioned above). For German it’s about 4x.

“small” is now faster than “tiny” was previously, which is a HUGE upgrade.

The multilingual performance is the most exciting part of this upgrade. Running at least “small” is required for decent multilingual transcription and previously it would be quite slow. Now decent multilingual dictation is real-time.

Supposedly “small” Spanish is better than English according to the published WER numbers, and German “small” performs as well as English “tiny” at the same speed:

WER on MLS

Here’s a side-by-side screen recording of Hello Transcribe 2.1 vs 2.2:

M1 Max vs A16 Bionic

Here’s an interesting tidbit: Whisper.cpp+CoreML is faster on my iPhone 14 Pro (A16 Bionic), than on my Macbook Pro (M1 Max), likely due to the fact the the M1 is based on the A14 which was found in the iPhone 12 Pro:

Whisper.cpp vs iOS Dictation

What about the built-in iOS dictation? Here’s a video of the built-in iOS dictation in Notes vs Hello Transcribe. I’m playing the audio through the speakers of my laptop so it’s not a very scientific test. Notes really struggles in this scenario:

Conclusion

This is a major leap forward for Hello Transcribe to produce accurate, fast transcriptions thanks to the amazing work by Georgi Gerganov and others.

Try it out and let me know what you think. I’m especially interested in the performance of multilingual transcriptions.

As always, if you email me I’ll add you to the TestFlight Beta group and you don’t have to pay for Pro, no questions asked.

Get it on the App Store now.