I am in the process of trying to figure out the best way to convert a large, transcribed corpus into a more up-to-date format and was wondering if anyone had any pointers…
The files are in the CLAN .cha (CHAT) format and are time aligned, audio linked files. The audio files were in .aiff and are now also in .wav files.
So far I have tried a number of things to do this in an automated way with not much success:
-
I imported the files into ELAN using the import function. This worked insofar as when they are imported I can see the files and play the corresponding audio. However, when I try to save them into the ELAN .eaf format I get the following error message:
“Unable to save this file The character “ us an invalid xml character”
If I ignore this and save the file anyway it is empty when I re-open it. -
I tried exporting the files as .TextGrid’s to see whether they could then be read by Praat. Then possibly reimported as a TextGrid and then saved as the ELAN format. Praat could not read the TextGrtid and although ELAN imported the file it was incomplete (lots of the turns were missing) and it was not linked to the audio.
-
Finally, another option was to export the text files and then manually go back through and time-stamp the files. This method does work but will be extremely time consuming so if possible I would like to use a more efficient method.
Any advice gratefully received!
Sophie Holmes-Elliott
