User Guide

The Language Archive - User Guide

The Language Archive - User Guide

Jeroen Geerts

December 2020

1 Introduction

The Language Archive (TLA) is an integral part of the Max Planck Institute for Psycholinguistics in Nijmegen. The archive contains various types of materials, including: audio and video language corpus data from languages around the world; photographs, notes, experimental data, and other relevant information required to document and describe languages and how people use them; records of speech in everyday interactions in families and communities; naturalistic data from adult conversations from endangered and under-studied languages, and linguistic phenomena. Currently, the Language Archive contains more than 350 collections, covering over 250 different languages that are spoken around the world.

This user guide describes how to navigate the TLA archive and how to access and view resource data. Below we explain a number of important concepts that are used throughout this guide:

  • Metadata: Metadata is information about the archived materials that allows others to discover and re-use them. TLA uses the CMDI metadata framework as a standard for its descriptive metadata. The archive supports a selected number of CMDI profiles that are listed here.

  • Collection: a Collection in the archive is used to group Bundles, or other Collections. This enables us to created hierarchical Collection structures. A Collection also has descriptive metadata.

  • Bundle: a Bundle in the archive contains one or more data files (e.g. audio & video data, etc) and their associated metadata. Typically, all files that are linked to the same Bundle should have a logical relation to one another, e.g. a video recording and its transcript, all trials for a given experimental task for a given subject, all photographs of a given event, etc. This typically means that the metadata for the Bundle applies to all files within the Bundle, e.g. in terms of date, location, participants, etc.

  • Persistent Identifier (pid): each collection, bundle and resource has its own persistent identifier, a handle link. This is a unique link which resolves directly to a Collection, Bundle or object in the archive.

2 Navigation

The Language Archive has a layout that is easy to navigate. The main page consists of a header bar with a general menu, a sub-menu for browsing the archive, a search box and a highlighted selection of Collections found in the archive.

The main menu, found on top of the page, contains a link to ELAN, our annotation software. There’s also a link to the user forums for support on ELAN and archive-related matters. Lastly, there a help section, which includes documentation and a contact form.

Login and account creation can be done via the ’Login’ link on the right. A link to visit the MPI for Psycholinguistics Archive is found here as well. This archive contains research data produced by departments of the MPI for Psycholinguistics.

Navigation of the archive can be done in several ways:

  1. browsing highlighted collections or departments

  2. browsing the complete archive

  3. browsing by a collection, genre, language... (using filters)

  4. searching by specific terms


2.1 Browsing

Clicking ’Browse Archive’ will take you to the main archive page. This page displays a list of Collections and/or Bundles (1) which can be browsed, an overview of access levels for resource data within these Collections (2) and additional filters that can be applied to the listed Collections or Bundles (3).
Note that access levels can also be applied as a filter. When browsing a certain Collection, the available filters will change accordingly. A search box (4) is available to search through all of the Collections in the archive.

Browse archive page
  1. list of Collections & Bundles

  2. access levels (section 2.3)

  3. filters (see section 2.2)

  4. search (see section 2.4)

2.2 Filters

Filters can be applied when browsing the archive to narrow down your search. This is done either by clicking the filter value (e.g the value ’Dutch’) or by clicking the + sign, located next to each filter. Certain filters can be excluded from your search by clicking the - sign.
Each selected filter will be applied immediately and results will be listed immediately as well. The list of chosen filters is visible in a breadcrumb path just above the Collection/Bundle list view. You can remove any active filters there by clicking the X. Excluded filters will be noted in the breadcrumb path with strike-through text.
Access levels can be also be applied as a filter, allowing you to only browse bundles containing openly accessible data, for instance. More about access levels in section 2.3.

Breadcrumb-view of selected and excluded filters

2.3 Access Levels

The Language Archive distinguishes 4 different access levels. These indicate the general accessibility of files, rather than the specific accessibility for any given user. Because different files within a Bundle can have different access levels, the access level filter will show results if at least one file within a Bundle has a given access level.

The 4 access levels are as follows:

  • Open. These files can be accessed directly by anyone without having to log in or register.

  • Registered. These files can be access directly by anyone who is logged in with a valid user account. This can be an account specifically for The Language Archive, or an account from one of the associated academic organisations when using the "Login via Shibboleth" option. See section 3.4 how to create an account.

  • Academic. These files can be accessed directly by all academic users. These are users that log in with an academic account (via Shibboleth) or whose academic status has been (manually) verified.

  • Restricted. Access to these files needs to be requested by using the contact form. See section 3.2 for more information.

Please note that the access level says nothing about the licensing terms for the resource. E.g. files that can be accessed by anonymous users typically still have some restrictions in terms of their usage or re-distribution, i.e. they are generally not in the "Public Domain". Please contact us with any questions you may have about usage or re-distribution.

Access level information can also be found here.

2.4 Searching

A search box is available on every page within the archive. You can quickly search the entire archive for a specific term, results will be sorted by relevance. Note that the search currently does not allow you to search within a specific collection, it will always be a site-wide search.
You can apply filters on the search results, to narrow down what you are looking for. The search query will also be listed in the breadcrumb path, and can be removed if need be. An advanced search tool is currently being developed.

Search results for ’Bread’ with additional filters applied

2.5 Collections & Bundles

The Language Archive consists of Collections and Bundles. A Collection in the archive is used to group Bundles, or other Collections. A Collection also contains descriptive metadata. A Bundle contains one or more data files (e.g. audio & video data, etc) and their associated metadata. Each Collection, Bundle and resource has a persistent identifier. This direct link can be used to access or refer to the page or resource directly.

Each Bundle-overview page displays a selection of metadata details relating to the resources inside the Bundle. The metadata can be viewed in more detail and downloaded on the lower half of the page.

Bundle overview
  1. Bundle pid

  2. Metadata

  3. Resources

A list of resources is displayed on the right, together with their access level labels. A certain resource may be openly accessible, whereas other files in the same Bundle may have restricted access. Request to these files can be requested, see section 3.1. A download-icon next to the resource indicates if downloading is permitted. Clicking on a resource will take you to the resource page. The next section goes into detail about viewing different kinds of resources.

2.6 Viewing Resources

Clicking a resource from the list on the Bundle page opens the resource overview-page. Some resources can be viewed directly from within the browser, depending on the file-type. In-line viewers, such as a video-player and an image-viewer, are available for the following file-formats:

  • mp4 files

  • wav files (when a streaming file is available)

  • text/html/pdf files

  • image files (jpg/tiff)

For other file-types, the resource page will display more details about the file, as well as the pid.

Bundles containing one or more images can also be viewed via the image gallery. To open this viewer, click the ’Open Gallery’ button above the list of resources on the Bundle-overview page. The gallery view displays all images inside the Bundle, as well as a description of the image, when available.

Gallery view

2.7 Downloading resources

Resources may also be downloaded for personal use, when access levels permit this. You can download individual resources, by clicking the download-icon next to the resource on the resource-list.

Downloading multiple files at once is possible via the ’Basket’ system. It is based around a shopping cart principle, allowing you to add complete Bundles and/or individual resources to a basket. Afterwards, you can ’checkout’ your basket and download the resources in a zip-file. This feature is only available if you are logged in with your account (see section 3.4 for more information about accounts).

2.7.1 Basket management

To access and manage your basket(s), click the ’My baskets’ item in the top-menu.

My basket menu item

The ’My baskets’ overview page displays a list of your baskets and queued checkout jobs, if any. To create a new basket, click the ’add’ tab. Next, enter a name (and description) for the new basket and click ’Add basket’. The newly created basket will be added to the overview, and will now be available from the drop-down menu on each Bundle page.

My baskets overview

2.7.2 Adding content to a basket

When you are logged in and browsing the archive, an ’Add to basket’ button is visible on each Bundle page and each resource page.

My basket button

The default basket is called ’username’s Basket’. Any additional baskets that you create will be listed in the drop-down menu as well.

You can add one or more Bundles to the selected basket by clicking the ’Add to basket’ button on the Bundle page. This will add all resources of the Bundle to the basket.

Bundle added notification

In case you only want to download certain files from all the Bundles you add to your basket, you can select the file-types upon checkout of your basket. See the next section for more information about checking out a basket.
To add an individual resource to your basket, click on the resource and then click the ’Add to basket’ button on the resource overview page.

2.7.3 Basket download

To manage or download the content of your baskets, click ’My Baskets’ from the main menu. Clicking on a basket on the overview tab will display the contents of the basket. Here, you can remove Bundles or resources if needed, and ’checkout’ (download) the basket.
You can select the Bundles/resources you want to checkout or you can choose to checkout all items from the basket. There is also the option to specify which file-types you would like to download. If no selection is made, all file-types will be checked out.
The total amount that can be added to a ZIP (before compression) is 200 GB. If your selection is larger, only a part that fits within 200 GB will be included.

Overview of basket contents

Upon checkout, a ZIP file will be created that contains all resources from the items in your basket, or from the items you’ve selected above. Note that only files to which you have access will be added to the ZIP. The creation of the ZIP file will take some time, depending on the size of your selection. You can view the current state of the process in the main overview of ’My Baskets’.

Notifcation of queued zip-file creation

You will receive an email once the zip file is ready for download. Please also check your junk-folder for this email.

3 Access to resource data

Each resource file in a Bundle has a certain access level, set by the depositor of the data. Some data may be openly accessible, whereas other data may be accessible for specific users only. These access levels are described in section 2.3. For resources that aren’t openly accessible, registration with the archive is required and a request for access is needed, depending on the access level.

3.1 Request access

To request access to restricted data, you will need to have an account (see section 3.2 how to create an account) and you need to be logged in to the system.

Fill out the request form found here: Contact Form Please include the handle-link (pid) to the requested materials when filling out the form. This persistent identifier is found on each Bundle-overview page.

Normally, we will need to contact the depositor, which may take some time. Once a response from the depositor, we will inform you about whether or not you will be granted access to the materials. The data will then be made accessible to you, when permitted.

3.2 Account

In order to access resources which are labelled as ’registered’, or to request access to data that have a ’restricted’ access level, an account with the archive is needed. The next section describes how to register with an account. An account is also needed if you want to use the bulk-download functionality, which is described in section 2.7
You can apply for ’academic’ status when you have an account registered. This academic status grants access to resources with an permission level set to ’academic’. We will verify if your account is eligable for this status. In order to request the academic status, please contact us once you have your account setup.

3.3 Registration

To register an account, go to the login page and select the ’create new account’ tab. If your institution is part of one of the supported Identity Federations, you may be able to use your own institutional account to log in to the archive with the "Login with Shibboleth" link. In this case, creating an account is not required.

Account registration
  1. Enter a username

  2. Enter a valid email address

  3. Your full name

  4. Enter your affiliation (e.g. institution or department)

Once you have registered for an account, it will be validated. You will receive an e-mail when validation is complete. This e-mail contains a link that allows you to set your password. You will also need to read and agree with the terms of use. After registration is complete, you can login and browse the archive.