Hej. Is there any way to control the settings in the “Fine audio segmentation for splitting audio into utterance level segements” recognizer dynamically, especially the “percentage of frames considered as low-energy frames” value?
I ask because if I use it on a file with multiple speakers, the settings are optimized for one speaker, or as a compromise. I had the idea to make a separate tier, where I could make annotations with a value of the speaker’s name, then somehow connect the value of the tier to control the parameters of the recognizer while it works through the file.
I’m open to a more under-the-hood type solution —I don’t see any option like this in the gui or the documentation— just need a hint where to get started, if it’s possible at all.
elan5.3, Mac os 10.12.6
I don’t think that possibility is available, not even under the hood. It sounds like you want to help the recognizer to train per speaker acoustic models?
For each recognizer in the Recognizer tab, there is a folder in the “extensions” folder in the ELAN application folder (on macOS you have to go into the Contents of the .app folder). The audio recognizers you are referring to are in folders starting with “clam-iais”. The available settings are listed in the “recognizer.cmdi” file, which is an xml file. In some cases some settings/options have been commented out in the xml, you can make them visible to remove the comment tags. You have to relaunch ELAN after a change.
You can also request a copy of the binaries of the recognizers (available for Linux) e.g. via the TLA contact form. Maybe the command line reveals more options.