Data

Revisions in 2009

The web site has been updated to include some references to the Sign Linguistics Corpora Network, in particular the workshop on metadata that took place in November 2009.

Additions to the ECHO corpus in 2007

A brief description of the ECHO data set

The corpus consists of linguistically annotated sign language data from three sign languages: Sign Language of the Netherlands (NGT), British Sign Language (BSL), and Swedish Sign Language (SSL). For each of these three languages, we have recorded sign narrations of the same five fable stories, a small lexicon, interviews with the signers. In addition, there is sign language poetry from BSL and NGT. Finally, the corpus includes two annotated segments of the Gehörlos So! corpus of German Sign Language (DGS) by Jens Heßmann.

Every session in the corpus consists of one or more video files plus associated ELAN transcription file, and a metadata description in the form of an IMDI file.

How to obtain and use the data

There are two ways to access the data in the corpus; the first will allow you to see a description of the data in the IMDI format, the second will only give you easy and direct access to the movie and annotation files. In both cases, to view movies with annotations in ELAN, you need to download one or more movie files, plus the annotation document.

  1. Access the corpus through your web browser; the address is http://corpus1.mpi.nl/ds/imdi_browser?openpath=MPI84302%23. The IMDI browser will open the ECHO corpus and select the sign language node; a copy of that node can be found under the MPI collection 'Sign Language'.
  2. Download files directly without viewing the metadata descriptions from the table below. This option is currently not available; we hope to restore this before the end of 2008.

To view an annotation file in combination with the video file(s), install the ELAN annotation software on your computer (see the technology page). To be able to look at the ELAN annotation files, you will have to download both the annotation and the media file(s) to your own computer, and make sure these files are in the same directory/folder. Also, ensure that the extension of the annotation file is ".eaf", and not ".xml" or ".eaf.xml". Some browsers and computers automatically attach the extension ".xml" to an annotation document. If this happens, ELAN will not recognize the presence of that annotation document on your machine.

To interpret the transcriptions correctly, please refer to the transcription conventions (PDF document). In addition, there are two separate documents describing the transcription conventions for the mouth behaviour, one for BSL/NGT and one for SSL.

Using and referring to these data

All movies and annotation files in the corpus are licensed under a Creative Commons licence. You are encouraged to use these data and the transcriptions for your own research, but you are not allowed to re-publish parts of the data elsewhere on the internet or in any other form. Thus, please always link to the original URL when referring to the data: http://www.let.ru.nl/sign-lang/echo/.

In publications, please refer to the authors of the data in the manner indicated in the IMDI files and in the table below:

Language Reference for data
NGT O. Crasborn, E. van der Kooij, A. Nonhebel & W. Emmerik (2004) ECHO data set for Sign Language of the Netherlands (NGT). Department of Linguistics, Radboud University Nijmegen. http://www.let.ru.nl/sign-lang/echo
SSL B. Bergman & J. Mesch (2004) ECHO data set for Swedish Sign Language (SSL). Department of Linguistics, University of Stockholm. http://www.let.ru.nl/sign-lang/echo
BSL B. Woll, R. Sutton-Spence & D. Waters (2004) ECHO data set for British Sign Language (BSL). Department of Language and Communication Science, City University (London). http://www.let.ru.nl/sign-lang/echo

Direct links to movie and annotation files (Links are currently being updated)

The table with links to the media and annotation files in the ECHO corpus is currently being updated; in the meantime, please use the IMDI Corpus Browser (access method 1 above) to obtain parts of the corpus.

To download the files directly without displaying them in the browser, control-click or right-click on the link and select 'Save as...' from the pop-up menu. Please not that some of the movie files are very large, and may take a long time to download. If the file size is over 100MB, this is indicated in the table.

You can read the ELAN section of this site for some first hints on how to go about using ELAN. If you only download one of the movie files, ELAN will issue a warning about incomplete media, but you can still use the annotation document. The corpus browsers offer some additional movie files that are not currently linked to the annotation files.

File naming conventions (used for all files except the DGS data):

The first three letters refer to the language (e.g. NGT)

The next two letters refer to the signer (e.g. WE)

The following letters refer to the type of data (e.g. poems)

The movie files have a final letter for the camera perspective:

The extension refers to the file type:

Warning: ELAN file extension

When downloading ELAN files using your web browser, some browsers automatically change the extension of the file to ".xml" (or adds the extension ".xml"), rather than using the ".eaf" as in the table below. The result will be that ELAN is not able to open the files. To recover from this problem, ensure that you manually change the extension of the annotation files back to ".eaf".


Creative Commons License
The movies and annotation files are licensed under the Creative Commons License 'BY-NC-SA': you may use it for non-commercial purposes if you refer to the author(s) of the work in the form indicated above.