Annotation via Interlinearization Mode and Whitespace Text Analyzer Configuration

Marco · 13 January 2025 11:33

Hi everyone,

I am trying to tokenize speech tiers in ELAN by using the Interlinearization Mode.

First, under the specific box “Configure Source-Target Configuration”, I selected “Whitespace Text Analyser” and the right Source-Target types (ie., Speech and Words).

Then I tried to configure “Whitespace Text Analyzer”. Since I don’t what punctuations such as . and , to be included in the source target “Words”, I added the option “Remove” and clicked “Apply”.

Then I clicked on “Analyze / Interlinearize” and while automatically splitting into single word-tokens, the system does not consider my configurations, ie., points and comas are not removed. What am I doing wrong?

Thank you so much in advance!

divya.kanekal · 13 January 2025 14:58

Hi Marco,

The steps described seem to be correct to me.
I tried the same steps in ELAN 6.9 and when I click on “Analyze / Interlinearize” at the top or do “Analyze / Interlinearize” on the individual “speech” annotation, I get the tokenized result with punctuations being removed on “Words”.

What is the version of ELAN you are using ?

Best,
Divya