Dear Divya,
In ELAN 6.9 (Apple Silicon), the new lexicon analyzer does not seem to infer the correct sense-level information for lexical entries associated with multiple senses, despite explicit statements in the lexicon connections. It appears to always retrieve information at the entry level, plus the first sense of the entry. Information from additional entries is not automatically retrieved correctly.
I describe the problem below. There are a couple of workarounds but I believe the incorrect inference is not the intended behavior of the new analyzer. An MWE can be downloaded (with .eaf and an xml lexicon) here, valid for 7 days.
Thank you for any help!
Best wishes,
Weijian
Set up
Lexicon analyzer:
| source | target1 | target2 |
|---|---|---|
| wd (word) | mb (morpheme) | ge (gloss) |
Lexicon connection:
| tier_type | lexicon_field |
|---|---|
| mb | lexical-id |
| ge | sense/gloss |
| ps | sense/grammatical-category |
| lexical_id | id |
| sense_id | sense/id |
Tier hierarchy:
| tier | stereotype | parent |
|---|---|---|
| mb (morpheme) | symbolic_subdivision | wd (word) |
| ge (gloss) | symbolic_association | mb |
| ps (part of speech) | symbolic_association | ge |
| lexical_id | symbolic_association | mb |
| sense_id | symbolic_association | ge |
Example triggering incorrect behavior
For example, the entry ‘watch’ has two senses, and they have different glosses and grammatical categories.
- sense_1: n., ‘timepiece’, sense_id = uuid_watch_sense_1
- sense_2: v., `look’, sense_id = uuid_watch_sense_2
During interlinearization, selecting sense_2, the corresponding sense look is retrieved, but the parser retries n and uuid_watch_sense_1, instead of v and uuid_watch_sense_2.
workaround 1
One workaround is to extract all sense-pos combinations of the same lexical item into separate entries. For example, ‘water’ has two senses, separated into two lexical units:
- water: (entry_id = uuid_watch_1) ‘n. h2o, sense_id = uuid_water_h2o’
- water (entry_id = uuid_watch_2) ‘v. give.h2o, sense_id = uuid_water_give.h2o’
During ambiguity selection, the analyzer infers part of speech and sense_id correctly as there is only one sense associated with each lexical entry.
possible workaround 2
A second (possible) workaround, which does not require dividing up polysemous entries, is to configure more analyzers and manually retrieve the ‘additional’ information. While this is feasible for human-readable fields such as part of speech tags, it is not feasible for non-readable fields such as UUIDs.
