I have a weird problem with a csv file. This file contains the text transcripts of a dataset, and it was previously created on Elan by someone else than me.
It seems to be corrupted someway: some of the annotations are duplicated, meaning that a correct annotation with the correct timestamps [ex. 60647,00:30:25.9,00:30:44.5,yeah it’s great,8,5,T2] is often followed by a copy of itself [ex. 60648,00:30:29.9,00:30:48.5,yeah it’s great,8,5,T2].
I would like to manually correct this issue, eliminating the copies, but here comes the problem: every time that I export the file, again, some annotations are duplicated!
To be clear, what I do is the same I’ve done with other files without problems: import (the transcripts) as csv file in Elan, [eventually work on them, but the following step happens regardless of any modification], export them with default settings as Tab-delimited text.
I also tried to import the transcripts, copy the annotations on new tiers and eliminate the elder tiers, but I got the same result.
Potential useful info about these copied annotations:
- They can be adjacent, or there can be a pause or even another annotation in between;
- Sometimes they have the same length as the originals, other times they don’t;
- Duplications happening after export don’t apply only to the lines already duplicated in the original file; they apply also to others.
