Find most common sequences in a tier

Hello,
I have annotated behaviour using interval coding (with pre-segmented blocks of 10 seconds) and according to a controlled vocabulary. Now I would like to find out which sequence of codes most likely leads to a specific code, let´s say code “A”. I assume it would be enough to have this most common sequence with respect to the 3 or 4 codes preceding “A”.

I could not find in ELAN an option to make such a sequential analysis automatically, so I have started doing it manually but it´s an endless and very error-prone task.

Thanks in advance,
Marta

Hello Marta,

I think there are two functions that you can explore, both work on multiple files:

  • File->Multiple File Processing->N-gram Analysis... allows to load the files (can also be just one), select a tier and specify the size of the n-gram (e.g. 4 or 5 in your example). The resulting statistics are sorted (by default) by number of occurrences of each pattern. The resulting tabular data can be exported or copy/pasted to a spreadsheet application, e.g.

  • Search->Structured Search Multiple eaf... the Single Layer Search tab has an N-gram over annotations mode, the results view has an option to specify the context size, it always shows left and right context. You could search for “A” and set the context size to 3 or 4. Counting would have to be done after an export to e.g. a spreadsheet application.

-Han

Thank you very much for your prompt response!
Best,
Marta