I am trying to export multiple ELAN files to a csv so I can continute my data analysis in R. My most important tiers in ELAN are a transcription tier and a word-tokenised tier. All my eaf-files can be exported and read into R without throwing any errors, but the durations of most of the annotations are wrong. I investigated a bit, and it looks like the transcriptions that only include one word have the correct duration. This leads me to believe that the issue might come from the word-tokenised tier.
Is this a common issue? Is there a way of exporting with the correct durations?
In this example, the highlighted annotation is 5513 ms long but in my csv file the same annotation has a duration of 324 ms.
In your export, if you have also exported the S_Transcription-txt tier, you will be able to see the duration of that annotation as 5513 ms. The tokenised tier’s annotations will have the duration value as 324 ms. The words in the parent annotation are split into equal parts in the tokanized tier. 5513 divided by 17 = 324 ms. As all tokens (words in the that parent annotation) on the destination tier ( S-Words-txt-da ) will have this same size (i.e. duration).
Thank you for responding! I did export that tier as well, but the correct duration of the entire utterance is not there. In R, the specific utterance I gave as an example looks like this:
Hi Johanne,
Could you please share a screenshot of your export settings, as well as the export file itself, so that I can look into this in more detail?
You can email them to me ( Divya.Kanekal AT [mpi.nl]).
Sure, I’ll send you an email!