Include self ASR output


My name is Claudia. I am testing ELAN as a tool to annotate Automatic Speech Recognition output in Dutch. I have the audio files and the automatic orthographic transcriptions of such files. I would like to use ELAN in order to annotate such transcriptions to a better lexical form.
I read the documentation but I can not find a way to include such automatic transcriptions on the system and work over them. Is it possible to do so?
If it is, is it possible to also align the transcription with the audio signal using the system? Right now I only have a txt file with the text but no punctuation or alignment is provided.

Thanks in advance!


If there is no alignment information in the ASR output, you can try one of the forced alignment recognizers that are integrated in ELAN.
In the Recognizer tab (in the right top area of the window) there are two recognizers you can try:

  • the Text-to-Speech alignment supports Dutch (not sure if it works well if there is no punctuation)
  • the BAS WebMAUS basic recognizer also supports (several variants of) Dutch

In both cases the files (the audio and the text) are uploaded to a web service, so it would be best to try them with a few short recordings.

It would probably be better if the ASR system could be configured to also output time alignment information, if that is an option.



Thanks a lot for your reply. I will check it out.


Hi hasloe

I just tried the recognizers and I get an error inside the recognizer (see image). Do you have an idea why?

Hi Claudia,

I’m afraid I have an idea of what’s wrong, after your report and after trying it myself. Most recognizers are developed as extensions/plug-ins, they aren’t an integral part of ELAN. I think that the recognizer can’t find an executable it needs to run (namely the java executable. This is nowadays more likely then it used to be, but I see that this is something we have to fix for a future version.
Can you try the other one, WebMAUS, or did you already do that (I would assume that one will work). If that one also stops with an error, I can give you some directions on how to get the first one to work (by editing a configuration file, it’s a bit nasty).
Sorry for the inconvenience…


Yes I also tried WebMAUS and I get the same error.
Is it possible to use a CSV file containing beginning and end times of a phrase to set a tier and work on the transcriptions?

Hmm, maybe I was wrong then with my assumption. If you, after dismissing the error message, click the Report button in the recognizer panel, what is the main error message in the report?

It is possible to import a CSV file with start and end times of phrases, it will create a tier with annotations. The CSV can also contain the phrases, as the contents of the annotations. But I thought the problem was you don’t have the alignment information? If the ASR system can produce a CSV, you can import that into ELAN (that was what I meant to suggest in my first reply).

Yes indeed I think is easier if I create the CSV file from the ASR output (I don’t get a CSV but I could automatically create one) and import that to the system. Can you tell me how do I import it? Is there any specificities to create the CSV.

There is no fixed, required setup for the CSV. Instead you have to specify what information is in which column when you start the import of the CSV file.
Start the import via File->Import->CSV/Tab-delimited Text File…, for the configuration please have a look at the relevant section of the manual.

PS I would still be interested in the error message in the report of the recognizers, if you can find some time.

Thanks a lot for your help.

The report is this one

Starting process at 16 Sep 2020 16:35:46
Server url:
Media file:	/Users/cmatosv/Documents/FlemishSpeechToText/resources/linde/audio/DE_JONGEN_VOOR_WIE_WE_ONZE_HANDEN_WASSEN._—_Linde_Volgt_Nick_op_pyjamadag.wav
Text file:	/Users/cmatosv/Documents/FlemishSpeechToText/resources/linde/transcription/DE_JONGEN_VOOR_WIE_WE_ONZE_HANDEN_WASSEN._—_Linde_Volgt_Nick_op_pyjamadag.wav.txt
Service url:
Form data Parameters: 
content-disposition: form-data; name="LANGUAGE"


Form data SIGNAL: --DaDa0x
content-disposition: form-data; name="SIGNAL"; filename="DE_JONGEN_VOOR_WIE_WE_ONZE_HANDEN_WASSEN._—_Linde_Volgt_Nick_op_pyjamadag.wav"

Form data TEXT: 
content-disposition: form-data; name="TEXT"; filename="DE_JONGEN_VOOR_WIE_WE_ONZE_HANDEN_WASSEN._—_Linde_Volgt_Nick_op_pyjamadag.wav.txt"

Uploading of files took: 0.862 sec
Parsing results started after: 2.026 sec
Running MAUS was unsuccessful. An unknown error occurred.
Output message: Could not execute the WebMAUS Basic Wrapper! Command used: <br/>nice -n 15 maus.pipe PIPE=G2P_MAUS SIGNAL=/var/lib/tomcat8/webapps/BASWebServices##2.43//data/2020.09.16_16.35.46_147BBBA33781F596DE0C8CD7231D5EC9/DE_JONGEN_VOOR_WIE_WE_ONZE_HANDEN_WASSEN._—_Linde_Volgt_Nick_op_pyjamadag.wav LANGUAGE=nld-BE INSKANTEXTGRID=true RELAXMINDUR=false OUTFORMAT=TextGrid PRESEG=true USETRN=false TARGETRATE=100000 TEXT=/var/lib/tomcat8/webapps/BASWebServices##2.43//data/2020.09.16_16.35.46_147BBBA33781F596DE0C8CD7231D5EC9/DE_JONGEN_VOOR_WIE_WE_ONZE_HANDEN_WASSEN._—_Linde_Volgt_Nick_op_pyjamadag.wav.txt INSORTTEXTGRID=true NOINITIALFINALSILENCE=false OUT=/var/lib/tomcat8/webapps/BASWebServices##2.43//data/2020.09.16_16.35.46_147BBBA33781F596DE0C8CD7231D5EC9/DE_JONGEN_VOOR_WIE_WE_ONZE_HANDEN_WASSEN._—_Linde_Volgt_Nick_op_pyjamadag.TextGrid  <br/>StdOut:  <br/>StdErr: ERROR: textEnhance: argument -i/--infile: the input file name must not contain any whitespace, non-ASCII characters, or POSIX regular expression metacharacters: *+?()[]{}\|^$ (dots are allowed)<br/>WARNING: maus.pipe : G2P does not support language nld-BE - using nld-NL instead<br/><br/>ERRORS:<br/>+ input file /tmp/3211_1600266947_DE_JONGEN_VOOR_WIE_WE_ONZE_HANDEN_WASSEN._—_Linde_Volgt_Nick_op_pyjamadag.wav_TEXTENHANCE.txt not found<br/><br/>ERROR: maus.pipe : service returned error status 1 - exiting
Unable to process the files: 204

Ok, thank you for this one, this really is a server side error (I’ll see if I can reproduce it). I guess the other recognizer will show a different error, if you could add that one too?

Hi Han

The report for the Text-to-Speech recognizer is quite big. Is there any other way that I can send it to you?

If it is big, my assumption for this recognizer was wrong too…
You can mail it to me at han.sloetjes AT