Deposit Manual MPI

The Language Archive - archiving manual MPI

May 2023

Introduction

The repository system of The Language Archive (TLA) features an integrated web-based deposit system that allows users to archive their data. This manual describes the use of this deposit system. Since it is an integral part of the archive, there is no separate URL for it, the deposit functionality becomes visible automatically to logged in users of the archive who have been granted deposit permissions (see section 1.1). TLA hosts research data from researchers at the Max Planck Institute for Psycholinguistics (MPI), as well as certain external depositors. The workflow described in this document is meant for MPI users only.

In order to facilitate long-term preservation of its holdings, TLA accepts a limited number of file formats, which are listed on this page. That page also contains any further conditions that apply to some of the accepted file types. File names can only contain alphanumeric characters (without accents/diacritics), dots, hyphens and underscores. Spaces in file names are not allowed. For each accepted format, only specific file extensions are allowed, as listed in the table. File extensions should all be lower case with the exception of ’TextGrid’.

Below we explain a number of important concepts that are used throughout the deposit system:

  • Metadata: Metadata is information about the archived materials that allows others to discover and re-use them. TLA uses the CMDI metadata framework as a standard for its descriptive metadata. We supports a selected number of CMDI profiles that are listed here. At the moment, the deposit system includes web forms for editing metadata using the MPI_Bundle, MPI_Collection, lat-corpus and lat-session profiles. Metadata in one of the other supported profiles cannot be edited online but can be uploaded as files.

  • Bundle: a Bundle in the archive contains one or more data files (e.g. audio & video data, etc) and their associated metadata. Typically, all files that are linked to the same Bundle should have a logical relation to one another, e.g. a video recording and its transcript, all trials for a given experimental task for a given subject, all photographs of a given event, etc. This typically means that the metadata for the Bundle applies to all files within the Bundle, e.g. in terms of date, location, participants, etc. There is currently a limit of 50 files that can be attached to one Bundle. For certain exceptional use cases where a larger number of files needs to be attached to one Bundle, those files can be zipped together in one zip file. Please check with the archive staff before doing so. For language corpora, this is typically not allowed.

  • Collection: a Collection in the archive is used to group Bundles, or other Collections. This enables us to created hierarchical Collection structures. A Collection also has descriptive metadata.

  • MPI workspace: MPI users need a "workspace" folder inside the network drive/share of their department for storing their research data and for uploading it to the archive. In case you do not have a workspace yet, you can request one by contacting datasupport@mpi.nl. A workspace will contain an ’archive_deposit’ folder that will be used for archiving data (in case no such folder is present in your workspace yet, contact datasupport@mpi.nl)

Back to index

Deposit permissions, user account & dashboard

Deposit permissions

Before you can use the archive’s deposit system, deposit permissions need to be assigned to your account for a given Collection within the archive. MPI users can request this by writing an email to datasupport@mpi.nl. Please specify your username when sending your request.

Login

MPI users can use their MPI account to log in to the archive. After logging in, you will be taken to the ’My account’ page. You can also go to ’My dashboard’ (see My dashboard) from the top navigation-menu once your deposit permissions have been assigned.

My account

In the ’My account’ section, you can view and edit your account information. You can change your password, add or edit a profile picture, set your timezone and affiliation.

My account

My dashboard

’My dashboard’ is the central hub for the archiving activities: the ’My Bundles’ section displays a list of Bundles that are currently in progress. ’My Collections’ lists the active Collections, to which you can add Bundles. ’My reports’ contains reports about Bundle validation and archiving actions.

My dashboard view

My Bundles

The ’My Bundles’ tab displays a list of all the Bundles that you are currently working on. These are either new Bundles that have not been archived yet, or updated Bundles that have not yet been submitted for archiving.

The table on this page contains some information about each Bundle. You can see to which Collection a Bundle belongs (if any), what the status of the Bundle is (see below), whether metadata have been created for it, and when it has been initiated. You can also delete Bundles by clicking in the ’delete’ link. (Note that deleting Bundles that are updates of existing Bundles in the archive will only delete the update, not the original version in the archive).

The ’status’ for a Bundle can be:

  • Open: the Bundle can be edited (either metadata or adding/removing resources)

  • Validating: the Bundle is being checked for valid metadata and resources.

  • Processing: the Bundle is being archived.

  • Failed: the Bundle validation or archive action failed (see the report (My reports) for more info). It may be possible to remedy the issue by re-opening the Bundle , editing it and submitting it again. In case of continued problems, contact the archive staff.

My Bundles (validating) view

My Collections

’My Collections’ displays all of your ’active’ Collections. These can be Collections you have added via the ’activate’ tab in the archive browser (see Activate Collection), or newly created Collections, which are automatically added.

You can click on an active Collection for more details. There is also a shortcut to view the Collection in the archive. Removal of the active Collection from the list can be done with the ’delete’ function. It will then be put on the list of inactive Collections. You can view the list of your inactive Collections via the link below the active Collections.

My Collections view

When viewing an active Collection in more detail, you can see whether there are any Bundles in progress that are associated with the Collection. You can work on them by clicking on their name. You can also validate, archive and re-open one ore more Bundles, by checking the box in the ’select’ column and by clicking on the desired action button. Note that for these actions, metadata for the Bundle should already be available. See the image below for a detailed overview.

My Collections: detail view

My reports

The ’My reports’ tab consists of an overview of notifications sent to you by the deposit system. These will mostly be validation and archiving reports of your Bundles. You can see whether a Bundle was validated or archived successfully, or whether it failed. You can view detailed information by clicking the report.

My reports view

If a Bundle failed to validate, or if archiving failed, you can view the report and try to find out what went wrong, or you can contact the archive staff with the included report and inform them that something went wrong.

You can also delete all reports older than 2 weeks, or choose to delete them all by clicking the appropriate button. This action cannot be reversed!
Back to index

Archiving data

Activate Collection

When you are ready to archive, inform the data-management team (datasupport@mpi.nl) that you wish to archive a project. They will then create a new empty Collection within your (department’s) section of the archive, based on a metadata form which will be sent to you.

In order to add Bundles or Collections to a Collection, it needs to be ’Activated’. To do so, locate the Collection in the archive browser and click the ’Activate’ tab on the Collection page. The Collection will then be listed in the ’My Collections’ overview of your dashboard (see My Collections). You will only see the ’Activate’ tab on Collections to which you’ve been assigned deposit permissions.

Collection option tabs

Once your Collection is active, you can start adding sub-Collections and/or Bundles to it. Bundles can either be added from the ’My Bundles’ overview or from the browser-tab shown below. Additional Collections can be added from the archive browser tabs only.

Active Collection option tabs

Add Collection

After activating your Collection, you can choose to add a new (sub)Collection (via the Add Collection tab in the archive browser), or to update the metadata of the current Collection (section 3.5.1).

To add a Collection to an active Collection, browse to the Collection in the archive browser and click the ’Add Collection’ tab. You can also reach the Collection via ’My dashboard>My Collections’ and clicking the ’Collection in archive’ link there. Next, click the ’Add Collection’ tab.

Add Collection

Select the metadata profile you wish to use. For most language corpora, this should be lat-corpus. For most other types of Collections, you should use MPI_Collection. Collection metadata in other supported profiles can be uploaded as a file. Once you’ve selected the appropriate profile, fill out all the mandatory fields and any other metadata fields you wish to add. The ’Add’ button found under certain metadata fields allows you to add multiple instances of a certain element.

You also need to select an initial access policy for the Collection. In exceptional cases where the Collection needs to be temporarily invisible (in cases where the title alone would give away too much information), you can make the Collection "private" and tick the "hide metadata" box. This will make the Collection only visible to you when searching and browsing the archive, and to no one else. The access policy can be refined later if necessary.

Note that most of the metadata values that you enter for the MPI_Collection will become the default values for any MPI_Bundle or MPI_Collection that you add to the Collection later on, such that you don’t need to enter the same information again, but only modify what is relevant.

When you have filled out all of the metadata, click ’Submit’. The Collection will be created and added to your list of active Collections. If you made a mistake while filling out any metadata, you will be notified by the system and you will need to correct the error and try the submission again. The submission of the Collection can take a bit of time. Please wait for it to finish.

Once the Collection has been created, you can update its metadata at a later stage if necessary, see Update Collection for additional info.

Add Bundle

Create Bundle

To add actual data files to your Collections, you will need to create Bundles. A Bundle should contain one or more files, in file formats that are accepted by the archive as listed on this page. See the remarks about file names and extensions in the introduction of this manual.

Create a folder inside your MPI workspace/archive_deposit folder. In this folder (for instance: workspace/archive_deposit/Bundle1), you can place the data you wish to add to the bundle. Your workspace can found on your department network share. If you do not have a workspace, please contact datasupport@mpi.nl.

To create a Bundle, go to ’My dashboard>My Bundles’ and click ’Initiate new Bundle’. Alternatively, you can initiate a new Bundle via the archive browser. Browse to an active Collection (Activate Collection) and choose ’Add Bundle’ from the tabs displayed.

On the page that appears, you give the Bundle a name, choose the Collection it belongs to and which access rights should be set. You then select the workspace that contains the resource data. The system will automatically detect and scan the archive_deposit folder, after which you can select the created sub-folder containing the research-data.

Add Bundle 1

Below you find a brief explanation of the fields you will need to fill out:

  • Title: enter the title of your Bundle. This title will also be copied into the metadata and will be the label for the Bundle that is displayed in the browser. Titles need to be unique within a given parent Collection.

  • Parent Collection: select the Collection you wish to add the Bundle to. (already set in case you initiate the Bundle via the "add Bundle" tab of the Collection)

  • Access policies: Select which access policy should be applied to the files within this Bundle.

    • "Open" materials can be accessed by anyone without having to log in. "Registered Users" means any user with a valid account for the archive.

    • "Academic Users" are users that log in with an academic account or whose academic status has been verified.

    • "Restricted" means that the materials are only accessible to the depositor and authenticated users, metadata can be visible or hidden, use this option if need be.

  • How will you provide metadata: this can be done by filling out a web-form, or by uploading a CMDI file.

    • Enter using form: metadata can be entered through an online form.

    • Upload a CMDI file: Upload a CMDI metadata file that you’ve created with a different tool. In case there are links to resources in your CMDI file, you will need to provide those files in the Workspace folder you select in the next step (number of files and filenames need to match). The metadata can be further modified using an online form once the bundle has been created.

    • Upload a CMDI as a template: Upload an existing CMDI metadata file to be used for this bundle, as a template. Any existing resource links will be removed and instead the resources in the Workspace folder that you will choose in the next step will be added to the bundle. The metadata can be further modified using an online form once the bundle has been created.

    If needed, you can edit the uploaded CMDI file in a web-based form once the bundle has been created.

  • How will you provide files: select a folder within your MPI workspace.

  • Department: choose your department of the MPI.

  • Workspace: choose your personal Workspace folder

  • Subfolder: select the appropriate folder inside the ’archive_deposit’ folder of your workspace. This can be repeated to select further subfolders, until you’ve reached the final folder that contains the files for your Bundle. As noted, the finally selected folder cannot contain any subfolders and can maximally contain 50 files. The current path you’ve selected is displayed at the top of the workspace selection form.

When you have filled out everything, click ’save’ to continue. The next page will display the created Bundle containing the resources from the selected workspace folder.

Bundle overview

The Bundle overview displays the files that will be included in the Bundle.

Bundle overview
  • Fill in/edit metadata for Bundle: This allows you to fill in the metadata for the data included in the Bundle (Fill in metadata for Bundle). Only enabled in case you’ve opted to fill in the metadata by using a form. If you uploaded a CMDI file, you can edit the metadata here.

  • Validate Bundle: Here you can check if your resource data and metadata are valid before archiving it (Validate Bundle). The data folders of valid Bundles are moved away from the workspace location, such that they can no longer be modified.

  • Re-open Bundle: When a Bundle is valid, it will be closed so you can archive it. Clicking ’Re-open’ will allow you to work on the Bundle again. Data folders will be moved back to their original location, such that they can be modified. After re-opening a Bundle, validation is required again before the Bundle can be archived.

  • Archive Bundle: Click this to archive your valid Bundle. The resource data will be moved to the archive together with the metadata (Archive Bundle).

  • Edit Bundle properties: You can edit some of the Bundle properties for Bundles that are still open, e.g. change the name, access policy or select a different data folder.

  • Delete Bundle: Click this if you want to delete the Bundle completely. (Selected files in your workspace will not be deleted).

Fill in metadata for Bundle

After creating the Bundle, choose the ’Fill in metadata for Bundle’ button to start creating metadata describing the resource data inside the Bundle. This step is not necessary when you chose to upload a CMDI file containing the metadata. You can still edit the metadata here, or you can immediately validate the Bundle (Validate Bundle).

On the next page, choose a metadata profile to use. After this selection, you will be presented with a form based on the chosen profile that you will have to fill out.

Metadata form

The ’Add’ button allows you to add multiple instances of a certain element. For instance, you can specify the location for your project by adding an address and/or geo-coordinates.

Some values of the form may have already been filled in. If so, these values have been inherited from the Collection that your have added the Bundle to. You can modify or remove these values if appropriate for the Bundle. Certain values, such as a description, or an actor, can be saved as a preset, and recalled for later use. See Metadata presets for more information.

When all of the (required) metadata has been filled in, click ’Submit’. If there were errors (an empty required field), you will be prompted to correct them. Otherwise, you will be taken back to the Bundle overview page where you can further process your Bundle.

Metadata presets

When filling out metadata for a bundle, certain values you enter can be saved as a preset for re-use. This is helpful when archiving many bundles containing the same description, or location, for example. These presets can be applied to any bundle you create, they are project-independent.

Values that can be saved and loaded as presets are indicated with the following icon:

Metadata preset load/save icons

To save a preset, first enter the metadata values, then click the ’save’ button. Enter a label for your preset and click ’save’ in the dialog to store the metadata value as a preset.

Saving a metadata preset

To load a previously saved preset, click the load icon and select the preset you previously saved. The values of the chosen preset will be entered in the metadata fields.

Loading a metadata preset

You can edit the values after loading a preset if need be, and save again. You can overwrite the old preset, or add a new preset.

Validate Bundle

Once you have successfully created or uploaded metadata for the resource data inside your Bundle, you must validate the Bundle prior to archiving it. The system will then check both the resource data and metadata to make sure the data is acceptable and contains no errors. You can only validate a Bundle after successfully creating a Bundle containing both resources (Create Bundle) and metadata (Fill in metadata for Bundle).

To validate a Bundle, go to the Bundle overview page (’My dashboard>My Bundles’), and click the ’validate’ button. Alternatively, and if you would like to validate multiple Bundles at once, you can go to ’My dashboard>My Collections’. Select the Collection to which the Bundle(s) belong(s), and check the ones you wish to validate.

Once validation starts, you will be taken to the ’My dashboard>My Bundles’ page and the progress will be indicated (My Bundles). When the validation process is done, the ’Status’ column will tell you the outcome (i.e. if it is valid, or if it failed an requires further attention).

My Bundles status view

In case the Bundle failed to validate, you will have to make corrections to it. You can check the validation report under ’My reports’ for more info regarding the error(s) (My reports). You can find more information on how to correct errors in the next section.

If the Bundle is valid, you may proceed with archiving it ( see Archive Bundle ).

Correcting failed Bundles

If a Bundle failed to validate, you will have to correct the error(s) to it. To do so, check the report of the failed Bundle (My dashboard>My reports).

Detailed report of failed Bundle

A Bundle may fail to validate due to issues with resource data or metadata. In both cases, you can read from the report what the source of the error is.

  • Invalid resource data: In the example picture above, a specific file did not validate. This happens when a file is not in a format of accepted file types (see the page of accepted file types). You can choose to remove the file from the workspace folder, or alter the file such that it is in an accepted file-format. Alternatively, contact datasupport@mpi.nl for help.

  • Invalid metadata: If you entered metadata for the resources in the Bundle, it may happen that something you entered is invalid. In this case, you will have to remove the invalid metadata file and re-create it. To do so, click on the Bundle to go the Bundle-overview page and click ’Edit Bundle properties’. Next, remove the attached metadata file by clicking ’remove’. Finally click ’Save’ to save the Bundle.

    Remove attached metadata file from Bundle

    Back in the Bundle overview page, you can now fill out the new metadata for the Bundle. See Fill in metadata for Bundle for more info.

Once you have corrected the error(s), you may validate the Bundle again by clicking on the failed Bundle. You will be taken to the Bundle overview, where you can click ’Validate Bundle’ again (Validate Bundle).

In case you’re seeing different errors or in case your Bundle fails to validate repeatedly, please contact the archive staff.

Bundle overview failed Bundle

Archive Bundle

Archiving a Bundle is easy, once it has been validated by the system. Simply go to the ’Bundle overview’ page by clicking on the Bundle in the ’My dashboard>My Bundles’ page. From there, click the ’Archive Bundle’ button.

Bundle overview ready for archiving

Alternatively, or if you want to archive multiple Bundles at once, go to ’My dashboard>My Collections’, click the Collection to which the Bundle belongs, check the boxes in the ’select’ column and click ’Archive Bundle(s)’.

Archive multiple Bundles via ’My Collections’

The Bundle(s) will be processed and you will be taken to the ’My dashboard>My Bundles’ page. After archiving is complete, a report will be placed in the ’My reports’ tab. If anything went wrong during archiving, it will be stated there, so please check the report(s) after archiving your data.

Update content

After archiving your data, you can revise it at any given time. You can update the metadata of a Collection or a Bundle, or add/remove resources from an existing Bundle.

Update Collection

To update the metadata of a Collection, browse to the Collection in the archive (via ’My dashboard>My Collections>link to active Collection’). Next, click the tab ’Update’ to start updating the metadata. When done, click ’Submit’ and the metadata for the Collection will be updated.

Update Collection metadata

Update Bundle

You can update a previously archived Bundle, by adding or removing resources, but also by updating the metadata. Go to the archived Bundle, and either select the ’Update Bundle’ tab (to add/remove resources) or the ’Update metadata’ tab (to update metadata).

Update Bundle resources

Update Bundle resources

To update the resources in the Bundle, fill out the form displayed on the screen. Select the appropriate folder in your workspace containing the additional resource files (If you only want to remove specific resource files, you will currently need to select an empty folder there!). When done, click ’Submit’ and you’ll be taken to the Bundle overview page.

In the overview, you will see the list of files currently in the archive, which you can select to remove from the Bundle. Below it is the list of files that will be added to the current Bundle.

Once you have updated your Bundle, click ’Validate’ to check if all data is valid. If everything is valid you may archive the updated Bundle. If you made a mistake or you noticed not all files that you wish to add are in the new overview, you can click ’delete Bundle’ to start over. This does not delete the Bundle from the archive!

Update Bundle metadata

If you only want to update the metadata of a Bundle, you can do so by selecting the ’Update metadata’ tab of the archived Bundle. You can either upload a CMDI containing the updated metadata, or edit via the form. The form will display the currently filled out metadata for the Bundle, which can be edited. Once done, click ’Submit’ .
Back to index