Collection development policy of The Language Archive

The goal of The Language Archive is to provide a unique record of how people around the world speak in everyday family life. It focuses on collecting spoken and signed language materials in audio and video form along with transcriptions, analyses, annotations and other types of relevant ancillaries (e.g. photos, accompanying notes). The archive includes speech data from everyday interactions in families and communities, focussing on families with children, but including naturalistic data from adult conversations from under-studied languages and linguistic phenomena. The archive may accept good quality data that falls under this policy from external depositors; requests will be evaluated on a case by case basis.

In case you are interested in depositing your materials with us as an external depositor, please consider the following:

  • Deposits will only be accepted if they are accompanied by good quality metadata records using one of the accepted CMDI metadata profiles (see list here).
  • Data will in principle only be accepted if they are in archival formats (see list here). File format conversion will not normally be done by us; please contact us if you wish to discuss this.
  • It is expected that external depositors will themselves use the web-based deposit facility of the archive to get their materials archived. We do not have the resources to archive your materials for you.
  • It is expected that the large majority of the deposited data will be made available without or with minimal access restrictions (freely available, available to all registered users or to all academic users).
  • We can only accept data from depositors who a) have received permission to share from their participants (informed consent) and b) own the copyright to the data or have received permission from the copyright holder to archive and share the data through The Language Archive.

In order to be able to evaluate your request, please send us a brief description (one page) of your collection that contains the following:

  • Which language(s) does it contain? In the case of under-studied languages, what is the situation of the language in terms of its vitality? Note that we give priority to corpora that include naturalistic conversations between members of the community, and/or under-studied languages.
  • What is the nature of the recordings? (natural interactions, elicited speech, interviews, monologues, story-telling, etc.)
  • Does the collection contain child and/or adult speech, or a mixture of both? In the case of child speech, what is approximately the age of the children?
  • How much data do you have, or do you expect to collect? (in GB or in hours of audio and/or video)
  • In which file formats are the data?
  • Have you already created metadata descriptions for your collection and if so, in which format?
  • What percentage of your recordings has been transcribed/translated/analysed?
  • Are there any issues in relation to copyright, informed consent or access restrictions?