Hi there!
Apologies if this an obvious question, or one which has already been answered (I’ve looked through some of the previous posts with no luck).
I am using ELAN to transcribe audio files; my only tiers are transcription tiers, i.e., with one tier per speaker which holds the transcribed text. I want it to export in the format:
(1) 00:00:000 - 00:03:341 speaker 1: hi there, how are you?
I could also work with this:
(2) speaker 1 00:00:000 - 00:03:341 hi there how are you?
However, when I use the traditional transcript text export, the timestamp exports in the line below the transcript, i.e.,
(3) speaker 1: hi there how are you?
00:00:000 - 00:03:341
And when I use the tab-delimited text export option, I get also not the desired output. I’ve tried a couple things, below:
→ With ‘exclude tier names’ selected (and the relevant time settings), I can get the format of (1) but the export is ordered by speaker not by time, i.e., it is ordered:
(4) *speaker 1: first contribution *
speaker 1: second contribution
…
speaker 1: last contribution
speaker 2: first contribution
etc for speaker 2
→ But if I select ‘Separate column for each tier’ (also with the relevant time settings), I do not get the participants’ names in the output.
It feels like I am just missing something very obvious, so please let me know what I should try!
With many thanks in advance,
Zoë
Hello Zoë,
I don’t think you’re missing something obvious; there are e.g. many options to customize the traditional transcript
output, but exporting the time codes on the same line as the speaker and the annotation etc. is not one of them. Same for export as interlinear text. Subtitle text formats are missing the tier and/or participant’s name in the output.
I think the best choice is the first tab-delimited text export option you mention (so without the ‘Separate column…’ option). Then open or import the result in a spreadsheet application (e.g. Excel) and sort the table by the begin time column (with apologies if this is something you already do or tried).
-Han
Hi Han,
Thanks so much for your prompt response.
Ah, that’s great to know I wasn’t just missing something that was staring me in the face! That all makes sense, including importing to Excel (I hadn’t tried that yet). The only issue is that I have quite a lot of files to transcribe - about 1,000 5-minute files (with only 2 participants, interviewer-participant) - so I’ll need to test if doing that, or creating and annotating a ‘speaker’ tier from a controlled vocabulary, is more efficient.
Thanks again for your help,
Zoë
Hi Zoë, yes, with that quantity of files the suggested workflow is maybe too laborious. Unless it can be scripted somehow.
May I ask: at the end, do you need 1000 text files or can the export of all transcriptions go into one text file? And, if you had the choice and an export format like (1) or (2) would be available, would you prefer the different parts of a line separated by (a number of) whitespaces (like in traditional text export) or by tabs (like in tab-delimited text export)?
Hi Han,
Thanks for another helpful reply.
Indeed, writing a bit of code to place the timestamps in a column as opposed to the following row might be a good idea for our team.
To answer your questions: I do need 1000 separate text files at the end of it. And I would love for this to be a future feature of ELAN! For my purposes, separation with tabs would be excellent as it would make it more readable to other programmes, and facilitate things like e.g., other programmes reading the start and end times of utterances separately.
With best wishes, and thanks again,
Zoë
Hi Zoë, I’ll add this (sorted tab-delimited export) to the wish list. I believe there are already a few other requests concerning that export function and we might have to split that export window into a “multiple step” window (“wizard”) to prevent further ‘cluttering’ of the user interface. So, we’ll have to see if and when we can work on this.
Hi Han,
That’s great to hear, thank you very much.
So before we close, just to check I have understood it correctly: there is currently no way to label annotations by participant name as part of an export that is ordered linearly by time?
With thanks again for your help,
Zoë
Hi Zoë,
Yes, that’s right (with an addition like this: “… an export that is ordered linearly by time and with all information (per annotation) on a single line or row”).
Best,
Han
Hi Han,
Thanks very much for responding, understood.
With best wishes,
Zoë