merging eaf files

We’re working on comparing two (or more) people’s annotations on the same videos for reliability purposes. Part of this involves taking two eaf files (let’s call them eaf1 and eaf2) and making a third eaf file (eaf3) which contains all of the tiers and annotation data from eaf1 and eaf2. I think the “Merge Transcriptions” option in the “File” menu is meant to do this, but it never seems to work how I expect. After merging, when I look at eaf3, which should have however many tiers as were in eaf1 + eaf2, I only have the number of tiers that was in one of them. (Usually eaf1 and eaf2 will have the same number of tiers and tier names based on a template for our lab, so I don’t know if this is the problem.) There are not even always all of the annotations from eaf1 + eaf2. For instance, if the tier “hand motion” has 25 annotations in eaf1 and 0 annotations in eaf2, there will be 25 in that tier for eaf3. However, if there are 25 in eaf1 and 21 in eaf2, there might be 26 or 27 in eaf3.

Does this sound like the behavior of the “Merge Transcriptions” option? I’m currently trying to write my own Java code based on the Elan source code to do this (perhaps it will rename a tier by appending the annotator’s initials to the tier name) but I’m having trouble figuring out where to start. Are there any classes that I should be looking at which would have a similar functionality to what I’m trying to do (merging multiple eaf files into one without merging tiers together)?

Thanks!

If the tier names in both eaf1 and eaf2 are same, then those tiers are merged together in eaf3 and as a result of this all the annotations are also merged together resulting in different number of annotations in the merged tier. I could add a request to “add a new option in the merge transcriptions dialog for not to merge the tiers with the same tier name but to rename one of the tier” in the user request list.

A work around would be copying all tiers in eaf1 and try merging the copied tiers with the tiers in eaf2 (i.e basically renaming the tier names in one of the eaf file).

But if you wish to implement it yourself then we could help you finding source code for merge transcriptions in the source tree

  • Aarthy.

Thanks for getting back to me. As of now I’m looking through the MergeStep1, MergeStep2, and MergeTranscriptionsCommand classes (I’m not interested in the GUI stuff now, so I’m trying to figure out what I actually need). Right now we’re just going to have a CSV file that will have the list of .eaf files that we want to merge, the destination of the new merged file, and possibly the tiers from each file that we’re interested in. I also want to rename the tiers by appending the annotator’s initials to each tier. I know you can manually do a lot of this, but the people I’m working with use a lot of tiers and will be merging quite a few files, so anything I can automate the process would really save them work.

Thanks again for getting back to me.

Good, you found the classes involved. The MergeTranscriptionsCommand with one or two utility classes can be useful for you, but given what you wish to do based on a CSV file, you’ll still have to implement a lot yourself. If you manage to do it and your solution is generic (e.g. no hardcoded tier names) it would be interesting to have it included in ELAN or make it available via ELAN’s third party resources page. It seems like functionality that more people would be interested in.

-Han