Has anyone noticed any issues with adding attributes directly to the XML generated by elan, either in the <ANNOTATION/>, <ALIGNABLE_ANNOTATION/>, or <ANNOTATION_VALUE/> tags? By “issue”, here I mean Elan’s ability to then reopen the file and function as usual after new attributes have been added.
I have experimented with it a little bit and seen no adverse effects, but I wouldn’t like to get deep into a project and find some issue someone else already experienced, or that the developers already know is a bad idea for some reason.
@Han (I see you usually answer in the forum), what do you think about this?
I think there are two issues with adding attributes that are not in the EAF schema:
- if you open such a file in ELAN it might seem that nothing special happens but if you go to View->View Log… you’ll see error messages generated by the parser. Currently ELAN just ignores such errors, but it this might change in the future. ELAN might become more strict in opening a file and then possibly offer an “ignore errors” mode so that an attempt can be made to repair the file in ELAN.
- if you save such a file, the additional attributes will be gone. So, adding attributes only makes sense if you’re not going to edit the files in ELAN anymore.
You could consider an alternative use of existing attributes, but these are mostly of the attribute type “IDREF”, so you would have to add corresponding elements (e.g. external references). Then it might be that this leads to crashes or errors in ELAN because ELAN interprets these attributes in a specific way. So, I wouldn’t like to encourage that approach either.
Yes, many errors in the log!
If I may advocate for the potential “ignore” errors mode, that would be fantastic. Lately, I’m finding it very convenient to programatically manipulate the xml directly. Of course, it doesn’t require being able to reopen files in ELAN, but it’s super convenient to be able to do so!
“– if you save such a file, the additional attributes will be gone. So, adding attributes only makes sense if you’re not going to edit the files in ELAN anymore.”
I hadn’t noticed this, so I’m glad you pointed it out. I’ll make sure to take that into account from now. Maybe if there’s a way to “ignore” added attributes on save would be a cool feature for the next version (if it’s even possible).
I’m not sure if I understand the last sentence about ignoring added attributes on save, but what I could imagine that some users want, is that any additional, non-schema, attribute is read and stored in memory when an eaf is opened by ELAN and then written unchanged when the eaf is saved again. And some users might want to add attributes to annotations, others to tiers or tier types, others to controlled vocabulary entries and others to all of these. But there is currently no place for that in the internal data structures and it would be a huge transformation to support such. (If one would like to accommodate such diversion from the EAF schema at all.)
Thanks again, and sorry for the unclear sentence. Your interpretation is exactly what I meant. At least for my purposes, added attributes getting erased (or ignored) on save isn’t really an issue. All my extra attributes are added programmatically and costs about one second longer than opening the script file (10hrs corpus), so this can just be done before using any subsequent processing that relies on those extra attributes.
As far as ignoring extra attributes on open, I hope if you do decide to make ELAN more strict that you’ll include that “ignore errors” option if possible.
Hi Han, Sorry to revive this old post, but I’m back at this issue again in another project where I would like to be able to add additional XML attributes that persist after editing .eaf files in Elan. Is it something that could be possible e.g. by customizing the namespace? Or overwriting non-defined elements is written into the compiled Elan executable?
Alternatively, I could imagine a workflow where additional attributes could be stored in external files, but then more elaborate ANNOTATION_ID value would be necessary. I tried replacing a1, a2, a3 with
a-<uuid>, which seems to work — XML validation completes with no errors or warnings. Open-edit-save-close several times works with no errors or warnings and the uuid ANNOTATION_ID attribute values remain. Is this a safe (supported?) strategy?
Finally, is there any way to safely stick custom XML elements in the eaf aside from the PROPERTY key/val pairs, without causing errors or getting overwritten while editing files Elan?
Thanks in advance!
Hi Bob, no problem to bring this up again. I’m afraid the situation concerning additional attributes and XML elements is still the same and I don’t think this will change in the near future.
But concerning the ANNOTATION_ID, that is a supported strategy: attempts are made to maintain the ID’s (provided these are valid ID’s) of annotations over multiple open/save and undo/redo actions, so it should be possible to build a workflow on that. I hope that’ll work!