Deposit Manual

The Language Archive - archiving manual

version 1.1

August 2018 - Jeroen Geerts


Introduction

The repository system of The Language Archive (TLA) features an integrated web-based deposit system that allows users to archive their data. This manual describes the use of this deposit system. Since it is an integral part of the archive, there is no separate URL for it, the deposit functionality becomes visible automatically to logged in users of the archive who have been granted deposit permissions (see section 1.1).

TLA hosts research data from researchers at the Max Planck Institute for Psycholinguistics (MPI), as well as certain external depositors. The workflows for both groups of users are described in this manual and differ slightly. The main difference is that MPI users will use an internal network drive/share for uploading their data, and external users will use a Nextcloud cloud instance hosted at the MPI.

In order to facilitate long-term preservation of its holdings, TLA accepts a limited number of file formats, which are listed on this page. That page also contains any further conditions that apply to some of the accepted file types. File names can only contain alphanumeric characters (without accents/diacritics), dots, hyphens and underscores. Spaces in file names are not allowed. For each accepted format, only specific file extensions are allowed, as listed in the table. File extensions should all be lower case with the exception of 'TextGrid'.

Below we explain a number of important concepts that are used throughout the deposit system:

  • Metadata: Metadata is information about the archived materials that allows others to discover and re-use them. TLA uses the CMDI metadata framework as a standard for its descriptive metadata. We supports a selected number of CMDI profiles that are listed here. At the moment, the deposit system includes web forms for editing metadata using the MPI_Bundle, MPI_Collection, lat-corpus and lat-session profiles. Metadata in one of the other supported profiles cannot be edited online but can be uploaded as files.

  • Bundle: a Bundle in the archive contains one or more data files (e.g. audio & video data, etc) and their associated metadata. Typically, all files that are linked to the same Bundle should have a logical relation to one another, e.g. a video recording and its transcript, all trials for a given experimental task for a given subject, all photographs of a given event, etc. This typically means that the metadata for the Bundle applies to all files within the Bundle, e.g. in terms of date, location, participants, etc. There is currently a limit of 50 files that can be attached to one Bundle. For certain exceptional use cases where a larger number of files needs to be attached to one Bundle, those files can be zipped together in one zip file. Please check with the archive staff before doing so. For language corpora, this is typically not allowed.

  • Collection: a Collection in the archive is used to group Bundles, or other Collections. This enables us to created hierarchical Collection structures. A Collection also has descriptive metadata.

  • MPI workspace: MPI users need a "workspace" folder inside the network drive/share of their department for storing their research data and for uploading it to the archive. In case you do not have a workspace yet, you can request one by contacting datasupport@mpi.nl. A workspace will contain an 'archive_deposit' folder that will be used for archiving data (in case no such folder is present in your workspace yet, contact datasupport@mpi.nl)

  • Nextcloud: External depositors make use of Nextcloud to upload data. Nextcloud is an open source cloud solution and the MPI hosts an instance specifically for the archive, see the section called “Nextcloud data upload (external depositors)”.

Back to main index

Chapter 1. Deposit permissions, user account & dashboard

Deposit permissions

Before you can use the archive's deposit system, deposit permissions need to be assigned to your account for a given Collection within the archive.

MPI users can request this by writing an email to datasupport@mpi.nl. Please specify your username when sending your request.

New external users who would like to deposit materials with The Language Archive should first send a request to the archive management staff at tla@mpi.nl, in which they describe their Collection as explained in the Collection Development Policy. This request will then be evaluated and upon a positive outcome, the user will be requested to create an account for the archive (see next section).

Existing external depositors who have never used the current deposit system should also contact the archive management staff to have a Nextcloud account set up.

Registration & Login

MPI users can use their MPI account to log in to the archive.

New external depositors who have been granted permission to deposit their materials can register as a new user by going to the 'Create new account' tab on the login page and fill out the requested details. You will get a confirmation email and your account will need to be activated manually by the archive staff.

After logging in, you will be taken to the 'My account' page. You can also go to 'My dashboard' (see the section called “My dashboard”) from the top navigation-menu once your deposit permissions have been assigned.

My account

In the 'My account' section, you can view and edit your account information. You can change your password, add or edit a profile picture, set your timezone and affiliation. If you have depositor rights and are an external user, you will also see your Nextcloud UserID here.

Figure 1.1. My account view

My account view

My dashboard

'My dashboard' is the central hub for the archiving activities: the 'My Bundles' section displays a list of Bundles that are currently in progress. 'My Collections' lists the active Collections, to which you can add Bundles. 'My reports' contains reports about Bundle validation and archiving actions.

Figure 1.2. My dashboard view

My dashboard view

My Bundles

The 'My Bundles' tab displays a list of all the Bundles that you are currently working on. These are either new Bundles that have not been archived yet, or updated Bundles that have not yet been submitted for archiving.

The table on this page contains some information about each Bundle. You can see to which Collection a Bundle belongs (if any), what the status of the Bundle is (see below), whether metadata have been created for it, and when it has been initiated. You can also delete Bundles by clicking in the 'delete' link. (Note that deleting Bundles that are updates of existing Bundles in the archive will only delete the update, not the original version in the archive).

The 'status' for a Bundle can be:

  • Open: the Bundle can be edited (either metadata or adding/removing resources)

  • Validating: the Bundle is being checked for valid metadata and resources.

  • Processing: the Bundle is being archived.

  • Failed: the Bundle validation or archive action failed (see the report (the section called “My reports”) for more info). It may be possible to remedy the issue by re-opening the Bundle , editing it and submitting it again. In case of continued problems, contact the archive staff.

Figure 1.3. My Bundles (validating) view

My Bundles (validating) view

My Collections

'My Collections' displays all of your 'active' Collections. These can be Collections you have added via the 'activate' tab in the archive browser (see the section called “Activate Collection”), or newly created Collections, which are automatically added.

You can click on an active Collection for more details. There is also a shortcut to view the Collection in the archive. Removal of the active Collection from the list can be done with the 'delete' function. It will then be put on the list of inactive Collections. You can view the list of your inactive Collections via the link below the active Collections.

[Note] Note

Deleting an active Collection does not remove it from the archive. It only moves it to the list of inactive Collections.

Figure 1.4. My Collections view

My Collections view

When viewing an active Collection in more detail, you can see whether there are any Bundles in progress that are associated with the Collection. You can work on them by clicking on their name. You can also validate, archive and re-open one ore more Bundles, by checking the box in the 'select' column and by clicking on the desired action button. Note that for these actions, metadata for the Bundle should already be available. See the image below for a detailed overview.

Figure 1.5. My Collections: Collection detail view

My Collections: Collection detail view

My reports

The 'My reports' tab consists of an overview of notifications sent to you by the deposit system. These will mostly be validation and archiving reports of your Bundles. You can see whether a Bundle was validated or archived successfully, or whether it failed. You can view detailed information by clicking the report.

If a Bundle failed to validate, or if archiving failed, you can view the report and try to find out what went wrong, or you can contact the archive staff with the included report and inform them that something went wrong.

You can also delete all reports older than 2 weeks, or choose to delete them all by clicking the appropriate button. This action cannot be reversed!

Figure 1.6. My reports view

My reports view

Back to index of chapter 1

Chapter 2. Archiving data

Activate Collection

For MPI users: if you have a new project to archive, inform the data-management team (datasupport@mpi.nl) that you wish to archive a project. They will then create a new empty Collection within your department's section of the archive.

For External users: if you have requested your data to be archived in TLA and your request has been approved, the archive staff will create a new empty Collection in the appropriate section of the archive.

In order to add Bundles or Collections to a Collection, it needs to be 'Activated'. To do so, locate the Collection in the archive browser and click the 'Activate' tab on the Collection page. The Collection will then be listed in the 'My Collections' overview of your dashboard (see the section called “My Collections”). You will only see the 'Activate' tab on Collections to which you've been assigned deposit permissions.

Figure 2.1. Collection option tabs

Collection option tabs

Once your Collection is active, you can start adding sub-Collections and/or Bundles to it. Bundles can either be added from the 'My Bundles' overview or from the browser-tab shown below. Additional Collections can be added from the archive browser tabs only.

Figure 2.2. Active Collection option tabs

Active Collection option tabs

Add Collection

After activating your Collection, you can choose to add a new (sub)Collection (via the Add Collection tab in the archive browser), or to update the metadata of the current Collection (section 3.5.1).

To add a Collection to an active Collection, browse to the Collection in the archive browser and click the 'Add Collection' tab. You can also reach the Collection via 'My dashboard>My Collections' and clicking the 'Collection in archive' link there. Next, click the 'Add Collection' tab.

Figure 2.3. Add Collection

Add Collection

Select the metadata profile you wish to use. For most language corpora, this should be lat-corpus. For most other types of Collections, you should use MPI_Collection. Collection metadata in other supported profiles can be uploaded as a file. Once you've selected the appropriate profile, fill out all the mandatory fields and any other metadata fields you wish to add. The 'Add' button found under certain metadata fields allows you to add multiple instances of a certain element.

You also need to select an initial access policy for the Collection. In exceptional cases where the Collection needs to be temporarily invisible (in cases where the title alone would give away too much information), you can make the Collection "private" and tick the "hide metadata" box. This will make the Collection only visible to you when searching and browsing the archive, and to no one else. The access policy can be refined later if necessary.

Note that most of the metadata values that you enter for the MPI_Collection will become the default values for any MPI_Bundle or MPI_Collection that you add to the Collection later on, such that you don't need to enter the same information again, but only modify what is relevant.

When you have filled out all of the metadata, click 'Submit'. The Collection will be created and added to your list of active Collections. If you made a mistake while filling out any metadata, you will be notified by the system and you will need to correct the error and try the submission again. The submission of the Collection can take a bit of time. Please wait for it to finish.

Once the Collection has been created, you can update its metadata at a later stage if necessary, see the section called “Update Collection” for additional info.

Back to index of chapter 2

Add Bundle

Create Bundle

To add actual data files to your Collections, you will need to create Bundles. A Bundle should contain one or more files, in file formats that are accepted by the archive as listed on this page. See the remarks about file names and extensions in the introduction of this manual.

[Note] Note

MPI users: you will need to place the data files you wish to add to a Bundle in a subfolder inside your MPI workspace/archive_deposit folder. Your workspace can found on your department network share. If you do not have a workspace, please contact datasupport@mpi.nl.

External users: you will need to upload the data files you wish to add to a Bundle into a Nextcloud folder. See the section called “Nextcloud data upload (external depositors)” on how to do so.

[Note] Note

spaces and special characters (!@#$%^&*(){}\|:;"'~`<>?/) are not allowed in resource data files. Please remove them prior to creating a new Bundle.

To create a Bundle, go to 'My dashboard>My Bundles' and click 'Initiate new Bundle'. Alternatively, you can initiate a new Bundle via the archive browser. Browse to an active Collection (the section called “Activate Collection”) and choose 'Add Bundle' from the tabs displayed.

On the page that appears, you give the Bundle a name, choose the Collection it belongs to, what access rights apply and you select the workspace folder that contains the resource data.

[Note] Note

The folder you choose for your Bundle cannot contain any subfolders.

Figure 2.4. Create Bundle 1

Create Bundle 1

Below you find a brief explanation of the fields you will need to fill out:

  • Title: enter the title of your Bundle. This title will also be copied into the metadata and will be the label for the Bundle that is displayed in the browser. Titles need to be unique within a given parent Collection.

  • Parent Collection: select the Collection you wish to add the Bundle to. (already set in case you initiate the Bundle via the "add Bundle" tab of the Collection)

  • Access policies: Select which access policy should be applied to the files within this Bundle. "Public" materials can be accessed by anyone without having to log in. "Authenticated Users" means any user with a valid account for the archive. "Academic Users" are users that log in with an academic account or whose academic status has been verified. "Private" means that the materials are only accessible to the depositor. Access policies can be refined later.

  • How will you provide metadata: this can be done by filling out a web-form, or by uploading a CMDI file. Web forms are currently only available for the MPI_Bundle and lat-sessions profiles.

  • How will you provide files: for MPI users, select a folder within your MPI workspace. For external users, choose 'Select a Nextcloud folder'. See the section called “Nextcloud data upload (external depositors)” below for more information.

  • Department: (only for MPI users) choose your department of the MPI.

  • Workspace: (only for MPI users) choose your personal Workspace folder

  • Subfolder: select the appropriate folder inside the 'archive_deposit' folder of your workspace. This can be repeated to select further subfolders, until you've reached the final folder that contains the files for your Bundle. As noted, the finally selected folder cannot contain any subfolders and can maximally contain 50 files. The current path you've selected is displayed at the bottom of the form.

When you have filled out everything, click 'save' to continue. The next page will display the created Bundle containing the resources from the selected workspace folder.

Nextcloud data upload (external depositors)

If you are an external depositor of the archive and you would like to upload data, you will need to use the Nextcloud instance hosted at the MPI. This is a cloud-based storage system, which is linked to the depositing system of the archive. The folders you create in Nextcloud can be selected when creating new Bundles (see the section called “Create Bundle” ). Note that if you have never logged in to Nextcloud and try to create or modify a Bundle, you will see an error message that you need to log in to Nextcloud first.

To be able to create Bundles in conjunction with the Nextcloud server, please follow these steps:

  • Open a new browser tab and go to https://archive.mpi.nl/nextcloud/. You can log in with the same account as you use for the archive, however the user ID for Nextcloud should always end with @mpi.nl. In case your user ID for the archive does not end with @mpi.nl, just add that suffix to it.

    Figure 2.5. Nextcloud login

    Nextcloud login

  • From the main overview page, you can start creating folders by clicking the '+' icon found on the top-half of the page. Choose the 'folder' option from the context menu that appears. Once you have the folders created, you can add data to it.

    Figure 2.6. Add folder in Nextcloud

    Add folder in Nextcloud

  • To add data to a folder, first click on the appropriate folder to enter it. Next, click the '+' icon again and choose the 'Upload file' function from the context menu. Select the file(s) you wish to upload from your local computer. Alternatively, you can also just drag and drop files (as well as complete folders in most browsers) from your computer to the web page.

    Figure 2.7. Drag & drop in Nextcloud

    Drag & drop in Nextcloud

  • Nextcloud also has clients available that you can download and install on your own computer. These work very similar to "Dropbox" or similar tools, where you can select a local folder that will be synchronised with the cloud. To use those clients with the MPI Nextcloud, you need to enter the URL of the server: https://archive.mpi.nl/nextcloud along with your login credentials.

    [Warning] Warning

    Files and folders in the MPI nextcloud are moved into the archive upon a successful archive action and are then no longer part of your Nextcloud storage. When syncing a local folder with Nextcloud, also your local copy inside that folder will be deleted! If you wish to keep a local copy of your files, you should copy rather than move files and folders into your Nextcloud local folder.

  • Once you have uploaded the data you wish to archive, you can return to the Language Archive page and start the creation of a Bundle, seethe section called “Create Bundle”. Choose the option 'Select a Nextcloud folder' as the way to provide data for the Bundle you are creating. You can repeatedly select subfolders, until you've reached the one you want to use. As noted, the finally selected folder for the Bundle can contain no further subfolders and can contain maximally 50 files.

    Figure 2.8. Select Nextcloud folder

    Select Nextcloud folder

Back to index of chapter 2

Bundle overview

The Bundle overview displays the files that will be included in the Bundle.

Figure 2.9. Bundle overview

Bundle overview

  • Fill in metadata for Bundle: This allows you to fill in the metadata for the data included in the Bundle (the section called “Fill in metadata for Bundle”). Only enabled in case you've opted to fill in the metadata by using a form.

  • Validate Bundle: Here you can check if your resource data and metadata are valid before archiving it (the section called “Validate Bundle”). The data folders of valid Bundles are moved away from the workspace or Nextcloud location, such that they can no longer be modified.

  • Re-open Bundle: When a Bundle is valid, it will be closed so you can archive it. Clicking 'Re-open' will allow you to work on the Bundle again. Data folders will be moved back to their original location, such that they can be modified. After re-opening a Bundle, validation is required again before the Bundle can be archived.

  • Archive Bundle: Click this to archive your valid Bundle. The resource data will be moved to the archive together with the metadata (the section called “Archive Bundle”).

  • Edit Bundle properties: You can edit some of the Bundle properties for Bundles that are still open, e.g. change the name, access policy or select a different data folder.

  • Delete Bundle: Click this if you want to delete the Bundle completely. (Selected files in your workspace will not be deleted).

[Note] Note

In case you wish to add additional files to your Bundle, you can do so by adding them to your selected folder as long as the Bundle is still open. If you reload the Bundle page in the browser you will see the current content of the folder.

Fill in metadata for Bundle

After creating the Bundle, choose the 'Fill in metadata for Bundle' button to start creating metadata describing the resource data inside the Bundle. This step is not necessary when you chose to upload a CMDI file containing the metadata. In that case, you can immediately validate the Bundle (the section called “Validate Bundle”).

On the page that will be displayed, you'll have to choose a profile to use for the metadata. After this selection, you will be presented with a form based on the chosen profile that you will have to fill out.

[Note] Note

- The choice for the metadata profile will depend on your department and/or prior archiving profiles. The archive manager will have informed you regarding this. If you are not sure, please contact the archive manager (datasupport@mpi.nl).

- Only the items marked with an asterisk are mandatory.

Figure 2.10. Metadata form

Metadata form

The 'Add' button allows you to add multiple instances of a certain element. For instance, if your Bundle contains multiple types of data (e.g. photographs and videos), click 'Add' to add an additional data type.

Notice that some values of the form will have already been filled in. For the MPI_Bundle profile, these values are inherited from the Collection that your have added the Bundle to. You should however modify these values if that's appropriate for the Bundle.

When all of the (required) metadata has been filled in, click 'Submit'. If there were errors (an empty required field), you will be prompted to correct them. Otherwise, you will be taken back to the Bundle overview page where you can further process your Bundle.

Back to index of chapter 2

Validate Bundle

Once you have successfully created or uploaded metadata for the resource data inside your Bundle, you must validate the Bundle prior to archiving it. The system will then check both the resource data and metadata to make sure the data is acceptable and contains no errors. You can only validate a Bundle after successfully creating a Bundle containing both resources (the section called “Create Bundle”) and metadata (the section called “Fill in metadata for Bundle”).

To validate a Bundle, go to the Bundle overview page ('My dashboard>My Bundles'), and click the 'validate' button. Alternatively, and if you would like to validate multiple Bundles at once, you can go to 'My dashboard>My Collections'. Select the Collection to which the Bundle(s) belong(s), and check the ones you wish to validate.

Once validation starts, you will be taken to the 'My dashboard>My Bundles' page and the progress will be indicated (the section called “My Bundles”). When the validation process is done, the 'Status' column will tell you the outcome (i.e. if it is valid, or if it failed an requires further attention).

[Note] Note

Currently, the "My Collections" page is not refreshed automatically. To see updates to the status of the Bundles belonging to a Collection, you need to reload the page in your browser.

Figure 2.11. My Bundles status view

My Bundles status view

In case the Bundle failed to validate, you will have to make corrections to it. You can check the validation report under 'My reports' for more info regarding the error(s) (the section called “My reports”). You can find more information on how to correct errors in the next section.

If the Bundle is valid, you may proceed with archiving it ( see the section called “Archive Bundle” ).

Correcting failed Bundles

If a Bundle failed to validate, you will have to correct the error(s) to it. To do so, check the report of the failed Bundle (My dashboard>My reports).

Figure 2.12. Detailed report of failed Bundle

Detailed report of failed Bundle

A Bundle may fail to validate due to issues with resource data or metadata. In both cases, you can read from the report what the source of the error is.

  • Invalid resource data: In the example picture above, a specific file did not validate. This happens when a file is not in a format of accepted file types (see the page of accepted file types). You can choose to remove the file from the workspace folder, or alter the file such that it is in an accepted file-format. Alternatively, contact datasupport@mpi.nl for help.

  • Invalid metadata: If you entered metadata for the resources in the Bundle, it may happen that something you entered is invalid. In this case, you will have to remove the invalid metadata file and re-create it. To do so, click on the Bundle to go the Bundle-overview page and click 'Edit Bundle'. Then, remove the attached metadata file by clicking 'remove'. Finally click 'Save' to save the Bundle.

    Figure 2.13. Remove attached metadata file from Bundle

    Remove attached metadata file from Bundle

    Back in the Bundle overview page, you can now fill out the new metadata for the Bundle. See the section called “Fill in metadata for Bundle” for more info.

Once you have corrected the error(s), you may validate the Bundle again by clicking on the failed Bundle. You will be taken to the Bundle overview, where you can click 'Validate Bundle' again (the section called “Validate Bundle”).

In case you're seeing different errors or in case your Bundle fails to validate repeatedly, please contact the archive staff.

Figure 2.14. Bundle overview failed Bundle

Bundle overview failed Bundle

Archive Bundle

Archiving a Bundle is easy, once it has been validated by the system. Simply go to the 'Bundle overview' page by clicking on the Bundle in the 'My dashboard>My Bundles' page. From there, click the 'Archive Bundle' button.

Figure 2.15. Bundle overview ready for archiving

Bundle overview ready for archiving

Alternatively, or if you want to archive multiple Bundles at once, go to 'My dashboard>My Collections', click the Collection to which the Bundle belongs, check the boxes in the 'select' column and click 'Archive Bundle(s)'.

Figure 2.16. Archive multiple Bundles via 'My Collections'

Archive multiple Bundles via 'My Collections'

The Bundle(s) will be processed and you will be taken to the 'My dashboard>My Bundles' page. After archiving is complete, a report will be placed in the 'My reports' tab. If anything went wrong during archiving, it will be stated there, so please check the report(s) after archiving your data.

Back to index of chapter 2

Update content

After archiving your data, you can revise it at any given time. You can update the metadata of a Collection or a Bundle, or add/remove resources from an existing Bundle.

[Note] Note

With every update, new Handle persistent identifiers are issued for updated files, Bundles, and Collections. Also the parents of updated items (up to the top of the Collection) will get new Handle persistent identifiers.

Update Collection

To update the metadata of a Collection, browse to the Collection in the archive (via 'My dashboard>My Collections>link to active Collection'). Next, click the tab 'Update' to start updating the metadata. When done, click 'Submit' and the metadata for the Collection will be updated.

Figure 2.17. Update Collection metadata

Update Collection metadata

Update Bundle

You can update a previously archived Bundle, by adding or removing resources, but also by updating the metadata. Go to the archived Bundle, and either select the 'Update Bundle' tab (to add/remove resources) or the 'Update metadata' tab (to update metadata).

Figure 2.18. Update Bundle

Update Bundle

Update Bundle resources

To update the resources in the Bundle, fill out the form displayed on the screen. Select the appropriate folder in your workspace or Nextcloud containing the additional resource files (If you only want to remove specific resource files, you will currently need to select an empty folder there!). When done, click 'Submit' and you'll be taken to the Bundle overview page.

In the overview, you will see the list of files currently in the archive, which you can select to remove from the Bundle. Below it is the list of files that will be added to the current Bundle.

Figure 2.19. Update Bundle overview

Update Bundle overview

Once you have updated your Bundle, click 'Validate' to check if all data is valid. If everything is valid you may archive the updated Bundle.

If you made a mistake or you noticed not all files that you wish to add are in the new overview, you can click 'delete Bundle' to start over. This does not delete the Bundle from the archive!

[Note] Note

Files that are deleted from Bundles are not deleted from the archive, but only removed from the Bundle. They will therefore not be shown when browsing or searching the archive, but they can still be reached if anyone uses their URL or Handle persistent identifier.

Update Bundle metadata

If you only want to update the metadata of a Bundle, you can do so by selecting the 'Update metadata' tab of the archived Bundle. The form will display the currently filled out metadata for the Bundle, and you can make alterations to it. Once done, click 'Submit'.

Back to index of chapter 2
Back to main index