Delimiter recognition failure for speech segments

ayushiaggarwal · 8 February 2019 21:57

Hello,

I am trying to view a TSV file(containing transcription data for code-mixed speech, primarily Hindi and English) that also has colons within the transcription. ELAN chooses the colons as the default delimiter for the file. When I change the delimiter to ‘tab’, ELAN does not split on tab uniformly. In a row with 8 tab characters, none of the tab characters are recognized as a delimiter, generating a single column with the entire row instead of the expected 9 columns.

Is this a known bug? Please let me know if sharing a sample file will help with this problem. I could email it to you separately since I am unable to attach it here.
Look forward to hearing back from you soon.

Thank you,
Ayushi

hasloe · 11 February 2019 11:12

I don’t think this problem was reported before, but I happened to run into it recently when trying to improve the import of tab-delimited or csv files.

It should be fixed in a next release. If you want you can send a sample file to me so that I can check if the fix really works (han.sloetjes AT mpi.nl).

-Han

ayushiaggarwal · 12 February 2019 02:28

Hello Han,

I sent the a sample file to you at the above email. Looking for confirmation on whether the fix works on the sample file.

Also, when will the next release be available?

Thank you,
Ayushi Aggarwal

hasloe · 12 February 2019 11:18

I’m afraid the sample file was not very useful; it contains only two rows (lines) and the second line does not contain any tabs or other suitable delimiter, only white spaces.

It is always difficult to estimate when a next release will be ready, but it might be a new (test) version will be available within one or two months and the fixes/changes that have been implemented until then, will be included.

But if you create a text file with tab-delimited lines, it can be imported in the current (and previous) version(s).

-Han