Lexicon Service and Lexus

Hi,

I am trying to set up a lexicon service, but cannot log on, since I am not sure what my username and password are. I have tried with the ones I use to log in to this site.

What do I do now? I can read about the Lexus tool, but I can’t find it.
I use the 6.4 edition of ELAN.

Hope you can help.
Best regards
Linea Almgren.

Hello Linea,

The Lexus lexicon tool is no longer maintained by The Language Archive and has been offline for many years now. The Lexus option is still there in ELAN, because theoretically other institutions could have a running instance of this open source application on their servers.

The SignBank option (for sign language lexicon services) is still actively used (I believe) and ELAN nowadays also has a small lexicon component built-in. May I ask what you were planning to use the Lexus service for? (In case you used to have data in Lexus in the past, it’s probably best to contact us by email.)

-Han

Hello Han,

Thanks for your reply.

I am working on the Dictionary of Danish Sign Language, and we are currently moving our corpus from another program to ELAN. In this process we have tried to establish a SignBank, but unfortunatly we do not have the technical ressources to delevelop and maintain it. When I read the manual it seemed that it was only possible to select a LEXUS 3 or a SignBank as the Lexicon Server Type. That’s why I asked about it, but maybe I have misunderstood, how it is set up.

The dictionary is an accessfile (Microsoft) which is located on a seperate server, and we want to make a link between ELAN and our signbase (only the signs and their glosses, not all the semantic information), to ease the proccess of annotating the data.
How can I set up this lexicon service? I don’t know what URL I should put, since it is not an online signbase.
And which Lexicon Server Type should I choose, if we cant have a SignBank and Lexus is no longer maintained?

I hope you can help me,
ant thank you in advance (:

Hello Linea,

First of all, your question about Lexus is completely understandable and legitimate, after all the option is still there. But unfortunately this is not a solution to your problem.
You mention you tried to set up your own SignBank instance, I assume you also contacted the maintainers of existing installations (e.g. as listed here) to see if the Danish SignBank can be hosted there?

Setting up your own (local) lexicon service which interacts with the Access database should be possible, but it probably requires more technical resources and expertise to develop such a service than establishing your own SignBank.

I’m wondering if creating an external controlled vocabulary, extracting the glosses and id’s from the Access database, would be sufficient for your purpose? Maybe scripts or workflows from colleagues in the sign language community can be adapted for that?

-Han

Hi again,

actually we didn’t ask if the SignBank could be hosted another place, since even setting it up was not possbile within our organisation. But we heard from other dictionaries of sign languages that the SignBank is quite a job to maintain. Unfortunatly, because it was exactly what we were looking for.

Our last idea is, as you suggest, to have en controlled vocabulary, that is updated regularly in order to be in sync with the the signbase in Access.

I am still not sure I understand what the lexicon service actually is. Is it the possibility to create a link such as we need from ELAN to a “database”, or list, in order to perfrom at lookup? And does it need a medium, like a webpage? Im just trying to figure out what the steps are to set up the actual lexicon that would be consulted (: even though I “hear” what you write about it being more advanced than the SignBank.

Thanks a lot!

Hello,

Concerning the lexicon service: ELAN can be extended in a few areas, one is a connection to a lexicon. There is a kind of “mini API” (programming interface) for that. The creator of such an extension should implement several functions, e.g. handle log-in (if applicable), list available lexicons, list the entries of a selected lexicon, list the fields of lexical entries, execute some search queries etc.
The service doesn’t need to be a web service and a web page is not required. The service could interact with a local (network) database or file.
These functions would indeed allow to do lookups and link annotations to entries (glosses) in the database or list.
I can give more pointers if you decide to give that a try.

Hi,
Thank you for the answer!

We have been trying to find another more simple way, but we haven’t come up with the perfect solution.
What we need is a way to make sure we annotate the signs correctly, and only with signs we have in our database. We also need to make sure that if we one day make corrections in the gloss of the sign, the correction will appear on all transcripts. We are currently working on a CV-solution and then transscribers will have to perform a lookuo in our database if they need to check if they chose the right sign.
But we have some problems, for example,

  1. it would be nice to see the ID number for each gloss when they are selected from the drop down menu while annotating, to minimize risk of chosing the wrong sign, but I can’t get two coloumns into the CV (as shown on the Figure 2.35. Edit CV Languages 4 in nthe manual - even though my problem is not about a new language, but the possibility of having two coloumns).
  2. We would have to manually add signs and corrections in the Edit Controlled Vocabularies-window - or is there another way to align what we have in the database with a CV? As I understand the CV it would also be difficult to run checks to compare the CV with our signbase in order to make sure they were identical. Or do you see a way?
    Ideally we would want to only make corrections one place, in the database, and then those corrections and additions would be applied where the CV is in use.
    If you have any suggestions to how to make this work, I would really appreciate it.
    If the soultion is to try and get the lexicon service up and running, then i would be happy to hear your pointers and advice!

Thank you in advance.
Linea.

Hi,

It’s only now that I notice that the numbering of the images is different in the manual as included in ELAN and the online version of the manual, we have to see to that. I assume you are referring to the figure 2.35 in the manual inside ELAN; that image illustrates that it is possible to create multilingual CV’s.

A better illustration of what you probably need is figure 2.50 (the right half of it), which illustrates that it is possible to show the description of each CV entry in a second column when editing an annotation. In the Preferences->CV panel there is an option to activate this second column.

Based on your description I think it is best for you to use an external CV, which is generated from the database you have. The description field of each CV entry could then contain the ID number (and possibly other useful information), supporting the annotators when trying to select the right gloss. You would need to create a script to extract fields from the database and create the external CV .ecv file.

The following steps would be required for this scenario:

  • creation of a script which produces (at regular intervals or after changes in the database) a new version of the .ecv file
  • the .ecv should be uploaded to a fixed location on the inter- or intranet, somewhere where it is accessible to all annotators
  • the .eaf files should not contain local, internal CV’s anymore (at least not for these glosses) but should link to the external CV(s)
  • when an .eaf is opened, it will check if there are changes in the ECV and update annotations where needed
  • transcription files can be updated batch-wise with the function File->Multiple File Processing->Update Transcriptions for ECV's...

I hope this is all correct. Creation of the .ecv from the database might be the main challenge.

Thank you very much, this was a great help!

Hi Han,

I have two questions regarding an ECV and imported transcriptions from other programs.

  1. An ECVentry has both a value and a description. Which field is meant for the actual annotation (in our case a gloss) and which is meant for the gloss’ id (in our case a number)?
  2. We are working on importing our corpus of transcriptions from another program (iLex) to ELAN, and the glosses used are corresponding to the new ECV we are creating. When we extract the data from iLex, should we then extract both gloss and id or will the gloss suffy as it matches the glosses in the ECV and should have the same id? And, if so (if the gloss is enough), how do we update the ecv references after the import?

Thank you in advance,
Linea Almgren

Hi Linea,

I’ll try to provide some answers:

  1. the value of a CV entry is the part that is used when applied to the annotation, so I guess in your case the gloss should be the value of the CV entry. The gloss ID could go into the description property. Depending on the format of the ID, it might also be possible to use the gloss ID for the CVE_ID property (but this has to be a valid XML ID).

  2. If I understand correctly, just extracting the gloss value would be sufficient. After the conversion to EAF, you could use the function File->Multiple File Processing->Update Transcriptions for ECVs... to add or update the reference of annotations to the correct CV entry (and with that to the ID’s). The Don't change the annotation value etc... option can be ticked for that purpose. This is assuming that the relevant tiers, via their tier type, are linked to the correct ECV and that the .ecv file is properly referenced. And also assuming that the gloss values are unique (otherwise manual disambiguation would be required afterwards). If the latter is not the case, it would be best to also extract and apply the gloss’ ID (and use if for the CVE_REF attribute of annotations, keeping in mind the last remark of point 1).

I hope this helps. You’re welcome to send me example files if you run into issues in the conversion process.

Best,
Han

Hi Han,

Thank you for your reply.

  1. The format of the ID are static unique integers. What is a CVE_ID?
  2. I am not sure I understand this. What we need is to be able to edit glosses in the ECV and thereafter it should be corrected in all other transcription files via ID.

I am afraid we can’t really explain what our problem is here, but maybe we could have a short online meeting to clarify our problem, and whether we are going in the wrong direction?

Please contact me on my email: liva@kp.dk if you are interested (:

Best wishes,

Linea.

Hi Linea,

The XML of an entry in an ECV looks something like this:

        <CV_ENTRY_ML CVE_ID="cveid0">
            <CVE_VALUE DESCRIPTION="a description" LANG_REF="und">a value</CVE_VALUE>
        </CV_ENTRY_ML>

The CVE_ID is a unique identifier, but, I checked, its type is just a string (not an XML ID type, which would put special constraints on its format). So, I guess the unique integers you have, can be used for the CVE_ID fields as they are.

It will be possible to edit glosses in the ECV and update transcriptions (assuming the ID’s remain unchanged). Transcriptions are updated to include changes in the ECV when they are opened in ELAN or batchwise when the Update Transcriptions for ECVs... function is run.

I’ll contact you by email, it’ll probably indeed be helpful to discuss things in an online meeting.

Best,
Han