Change Parent Tier for multiple tiers

Hi (is Han still running the show? If so - Hi Han!)!

I’m doing the unthinkable, which is trying to force a tier structure (e.g., both Parent and Child tiers) on an ELAN file that currently consists only of Parent tiers. This is because I did data cleaning in R using the .csv and then reimported back to ELAN, but now I want to get the dependencies sorted so that the ELAN file is functional for coding going forward.

For very good reasons, I understand that you can’t simply import the .csv into a template with dependent tiers, as because of the constraints, some annotations might dissapear if the alignment of any of the proposed child annotations are slightly out of sync with the proposed parent annotations. I have however time aligned all the dependent annotations in R on the .csv, so am confident that if I could do this, I won’t lose anything (and can run checks to determine if anything has dissapeared).

So as I understand it, my options are “Change Parent of Tier” or “Copy Annotations from Tier to Tier” from within ELAN. However, I will have to do this across multiple files, with multiple tiers per file, introducing the possibility of errors if I do this tier by tier, file by file (although it’s not impossible, and I can run automatic checks on the files after to make sure it’s been done correctly).

I’ve tried scripting it, by directly editing the xml in R (following Amalia Skilton’s excellent GitHub repository of code - github[dot]com/amaliaskilton/eaf_scripts), however my understanding of the XML is limited, and editing all the different elements of the annotation script for each annotation (e.g., moving from TIME_SLOT_REF to ANNOTATION_REF) is an intense process, especially in R. Again, not theoretically impossible, but perhaps at this point, a longer task than the edits using ELAN.

So my question is - is there anything I’ve missed - i.e., some functionality I don’t know about (e.g., within the Multi File Editor) that can make this task easier? No problem if not, of course, the options that exist within ELAN are still incredibly helpful - this issue has arisen because of misuse of ELAN…

Thanks!
Ed

Hi Ed!

I’m still there, not sure if I’m running anything though.

First of all, I don’t think you missed something, the functionality you’re looking for is not available in ELAN. The “Copy Tier” function is similar to “Change Parent of Tier” and might be considered as well (next to “Copy Annotations from Tier to Tier”); it depends on the structure of the starting point which function is most suitable. But indeed, a lot of manual steps on a file-by-file basis to accomplish your goal.

Achieving this by direct processing of the XML sounds quite complicated and error prone to me. It may be doable in very controlled situations, e.g. when you know that the number of annotations on the two tiers are exactly the same etc., but, just like with importing a .csv into a template (which would indeed be a nice function to have), usually all kinds of checks have to be build in and irregularities have to be dealt with.

I don’t have a running R environment on any of my computers at the moment, so I can’t really estimate how much it would take to modify an existing script to update the structure of an eaf file resulting from a csv import.

Best,
Han

Hi Han!

Thanks for the response!

So, this is very interesting! Actually, the controlled situation is not so implausible in our case. What do you think to the following?

The setup. I have imported a .csv file into an ELAN template. All the parent tiers that will be parent tiers have annotations on. All the child tiers (specified by the template) are blank, because you can’t import onto child tiers from a csv. There are therefore extra parent tiers with the information that I would like to be on the blank child tiers also present in the document. Let’s for ease say that the child tiers are all constrained by Symbolic Association.

Steps:

  1. On the parent tier that I want to be a child tier, one could make sure that it has the same number of annotations as its proposed parent tier (with a placeholder value, e.g., “placeholder mcplaceholderface” to indicate annotations that will be ultimately removed). Crucially these would fill in gaps - so if a1 on the proposed parent tier had a proposed child annotation, a2 and a3 didn’t, but a4 did, then the proposed child tier needs to have 4 annotations with two placeholders in the middle. This can be done in the .csv relatively painlessly with R code.

  2. I could simply add annotations on all the (currently blank) child tiers from the parent tiers in ELAN (which is a simply a few clicks thanks to ELANs super functionality in that regard! e.g., “Create Annotations on Dependent Tiers…”, which can be done simultaneously for every child tier in the file). That way, all the ANNOTATION_REF and TIME_SLOT_REF values in the XML have already been set up by ELAN and are in place so I don’t need to worry about attempting to code any of that (which is fraught with danger).

  3. Then, I could run code to edit the XML to simply put the annotation values in from the proposed child tiers onto the actual child tiers, by using code to sub these in between the tags on the actual child tiers. Amalia’s code should work in that regard I think (it can basically paste the annotation values from one tier onto another as long as the number of nodes/annotations is exactly the same). The R script (condensed from Amalia’s github code) would be:

xml_text(xml_find_all(eaf_file, paste(“.//TIER[@TIER_ID='”,target_tier_name,“']//ANNOTATION//REF_ANNOTATION”,sep=“”))) ← xml_text(xml_find_all(eaf_file, paste(“.//TIER[@TIER_ID='”,source_tier_name,“']//ANNOTATION//ALIGNABLE_ANNOTATION”,sep=“”)))

I note that you need to do a little bit of housekeeping on the code before it can write back to the ELAN file (e.g., some gsubs to replace some closing tags that get a little changed by read_xml - again identified by Amalia).

  1. use “Remove Annotations or Values” to get rid of the placeholder annotations on child tiers that were simply there to block that slot. Use Edit Multiple Files or similar to remove the now redundant parent tiers that have been replicated on the child tiers. Or perhaps this can be done in the xml file too. In any case, a quick process across all files in one go.

Does that sound erm… acceptable? Is there anything here that you think I’m being hopelessly optimistic about, or crucial that I’ve missed that could spell disaster? It’s perhaps more efficient than Change Parent Tier x number of times, but does it introduce more problems I wonder…

I guess a weird quirk of this method is in fact the annotations on the parent tier that you want to copy to the child tier don’t even have to be correctly time aligned to the true parent tier, so long as they are ordinally correct relative to what you want them to align with on the parent tier (e.g., with the use of placeholders if some parent annotations should be missing child annotations)

Also, for implementing the above strategy to child tiers where the constraint is included in, am I right in thinking the xml code for the tier is identical to a parent tier, just that the linguistic type is one where that constraint exists (so nothing discernable in the code changes at least for the node concerning that tier depending on whether it is a parent or a child with the Included In constraint?). If so, this should be even more straightforward, as we simply have to edit x in the “LINGUISTIC_TYPE_REF=“x”” element of the Tier tag?

It feels more possible… :slight_smile:
Ed

Oops, just on that last point - I’m not correct. Let’s leave “Included In” constraints for now. I think the structure is different, there’s also a PARENT_REF. And I’m not sure if I’m walking into a whole load of pain with the timeslots…

Ed

Hi Ed,

I guess this could work. I’m not sure if I fully understand the steps in step 1 (whether this is all in R or in combination with ELAN functionality), but I assume it is possible to get the number of annotations per tier combination right. Step 2 seems clear and step 3 too, but I don’t know how R exactly performs these functions (whether it first reads everything into memory and writes everything after processing) and whether or not you can specify more than one source and target tier at a time etc. Step 4 for removing the copied tiers seems clear too.

In the workflow you describe it could probably work for any tier type, as long as you have created the hierarchy in ELAN (in the template) and used the “Create Annotations on Dependent Tiers…” to create exactly one annotation on the child tiers. The tier type has then already been set and doesn’t need to be changed by the script.
The difference for Time Subdivision and Included In is that the target elements to paste into are not REF_ANNOTATION’s but ALIGNABLE_ANNOTATION’s. Also, since you intend to copy the annotation content, I guess you maybe could also use the path //ANNOTATION//REF_ANNOTATION//ANNOTATION_VALUE (and same for ALIGNABLE_ANNOTATION), but, again, I don’t know how R exactly performs these functions.

If not already on it, I’ll add the option to import a csv in combination with a template to the request list (which doesn’t mean it’ll be implemented soon :wink: ).

-Han

Hi Han!

Thanks, this is great! Just clarifying, yes - step 1 would be in R. And step 3 should be possible to implement on multiple tiers by simplying using some for() loops in R! (And indeed the whole thing is possible using a big for() loop across all participants - assuming the filenaming structure is good enough!). Some quick testing suggests that this approach will basically work. If there’s something interpretable at the end, I can post the code/procedure up here for people in the future.

For Time Subdivision and Included In, I think this is more complicated, because the time subdivisions have to be set somehow - i.e., with the correct slots (which may simply transfer over from the parent tier to the child tier), but then each one needs to be linked to the relevant parent annotation I think. More thought needed from me…

But yes - this functionality would be great! Although I do understand why it doesn’t exist yet, because if the dependent annotations in the csv do not exactly match the parent annotations then they will be dropped out. As you said, all the checks and irregularities would need to be addressed.

Thanks again!
ED