ELAN 6.5 Whisper recognizer

Hi! I couldn’t find any discussion about latest Whisper recognizer in 6.5 version, so made a new topic…
The prblem is that I installed Whisper AI on my laptop and it works well with transcription in different languages in PyCharm, but I still can’t manage to use it inside ELAN, it’s always the same problem -
cannot run the recognizer - Create process error=2

Hi, I guess no-one mentioned Whisper here before indeed. It is a kind of experimental extension and probably not easy to get it to work.
When the Whisper recognizer is selected in the Recognizer tab, the ? button shows a short html page with more or less all the information there is. You’ve probably seen this, but for anyone else, the notes or ‘known issues’ section at the end give some examples of the, sometimes complicated, command that has to be entered to properly invoke Whisper. This depends on the setup of the system, the environment and the path etc. and may not be intuitive.
I wonder if the Report button provides more information than you mention (but maybe that error is from the report).

It can be nice if Whisper can be run from inside ELAN, but it might be more productive to run it separately and import the results into ELAN. By the way: the extension was created half a year or so ago and it’s not sure if everything still works (including import) with more recent versions of Whisper.

-Han

I’m so glad that somebody responded!

I didn’t know that there is such tab as ‘?’ inside the recognizer… Thanks to your advice I managed to actually launch the recognizer inside ELAN. The main issue was to really correctly write the path to whisper, I installed it as environment in PyCharm and copied the path to command line in ELAN. It’s important to mention also that you have to have ffmpeg and torch (by PyTorch team) installed as environment, in my case in PyCharm.
Then the recognizer works with mp3 and wav files, for mp4 you also need to install numba as environment.
Now I’m working on setting the diarization of speakers while using whisper, as it only creates one tier after recognition.

Thank you so much, Han! I’ve been struggling for nearly a month!

Sorry to hear it took you so long to get it working. I hope that month includes getting Whisper to work at all, not only inside ELAN!
We’ll have to see what the newer version(s) of Whisper provide as results (separate speaker output?) and whether the ELAN extension needs to be updated to import these results as good as possible.

Yes, I mean it took a month since I’ve seen whisper recognizer inside elan for the first time and decided to find out what it is and how it works.
Well separate speaker output would be just perfect, let’s hope that it will really come out soon!
I also have to say that transcribing video/audio with good sound and one speaker with the use of whisper transcriber in elan takes about an hour when using CPU. I suppose using GPU will help to solve this.
Anyway, this feature definitely saves a lot of time!

p.s.

Using whisper in PyCharm, exporting the result in SRT format and uploading it to elan seems to work much faster

Yes, the latter is probably the best workflow for production work.
The ELAN extension is a kind of ‘wrapper’ which calls whisper from the command line in the end. I would guess that performance wouldn’t need to be too different from running it directly from the command line. Anyway, I guess the extension in ELAN can be useful for testing different parameter settings, to let it process a few minutes of the recording and see the results etc.

This extension is very useful indeed! I just noticed one strange thing… the segments made from this recognizer are usually 2-seconds each, but what is more tricky, it’s that whisper doesn’t recognize silence… I mean it fills all timeline with segments, even if there was no speech… Maybe you know something about this? Maybe it is somehow set in advanced settings?

Yes, I recognize that from the tests with the version of Whisper I have; the start time is often fairly accurate but the end time often isn’t. I assumed that newer versions of Whisper produce more accurate results, but maybe this is in a layer of the results that are not (yet) picked up by the ELAN extension. I heard about word level segmentation in Whisper, but maybe that was in combination with a forced alignment extension.
I’ll try to update my Whisper version one of these days.