Custom token delimiter should be allowed not required for CJK texts

For Chinese/Japanese/Korean texts, there are no space between words. So it should be allowed to no input in the cutom delimiter on the Tier->Tokenize Tier dialogue. If no input, the texts should be tokenized character by character.
For your testing, provide some Chinese texts as below:
所以ELAN 软件应该是分析多模态 隐喻中不同模态的比较理想的工具。使用ELAN 之前,首先要对分析的模态进行赋码。

Thanks for pointing this out; it’s true that currently there’s no way to tokenize CJK text character by character. We’ll add this to the request list.



I had a look whether there are tokenizers for CJK languages available in the WebLicht infrastructure. One tokenizer produces output like this:


Not sure if this is useful, but if you are interested you could maybe first try a few texts in the web interface of WebLicht. You may have to request an account, if your institution is not listed as an identity provider.

If the online results still seem useful, I can assist in trying to run a so called toolchain from within ELAN.