Accepted file types and formats

The tables below list the accepted file types and formats in The Language Archive at the MPI for Psycholinguistics. We make a distinction between formats we support for long-term preservation, including format migrations if those formats become obsolete, and formats for which we can only guarantee preservation of the files themselves (bit-stream preservation) for a minimum period of 10 years, without the guarantee that we will be able to migrate them to future formats.

The listed file extensions are obligatory for deposited files; files will be rejected if their extensions differ from what is specified in the tables.

Long-term Preservation Formats

Media resources

Type

Format

Assigned MIME

Standard MIME

Extension

Remarks

Audio

WAV

audio/x-wav

audio/x-wav

.wav

16/24 bit, 44.1/48 kHz, uncompressed PCM, 1 or 2 tracks.

Video

MPEG1

video/x-mpeg1

video/mpeg

.mpg

Video stream bitrate min. 300 kbit/sec, resolution min. 352x240. Max 2 audio tracks.

MPEG2

video/x-mpeg2

video/mpeg

.mpeg

MPEG2 Program Stream, video stream bitrate min. 3 mbit/sec, resolution min. 640x480. Max. 2 audio tracks.

MPEG4

video/mp4

video/mp4

.mp4

Video codec: H.264, audio codec: AAC. Video stream bitrate min. 300 kbit/sec, resolution min. 352x240. Max. 2 audio tracks.

Image

JPEG

image/jpeg

image/jpeg

.jpg

 

PNG

image/png

image/png

.png

 

TIFF

image/tiff

image/tiff

.tiff

 

SVG

image/svg+xml

image/svg+xml

.svg

 

 

Textual resources

Type

Format

Assigned MIME

Standard MIME

Extension

Remarks

Structured Annotation

EAF

text/x-eaf+xml

text/xml

.eaf

ELAN annotation file format

PFSX

text/x-pfsx+xml

text/xml

.pfsx

ELAN settings file for a specific annotation file. Not strictly necessary to preserve but can be archived along with EAF for convenience.

CHAT

text/x-chat

text/plain

.cha

CHILDES/CLAN text format. Use UTF-8 whenever possible

Toolbox Text

text/x-toolbox-text

text/plain

.tbt

Use UTF-8 whenever possible.

Praat TextGrid

text/praat-textgrid

text/praat-textgrid

.TextGrid

Praat TextGrid annotation file (only plain text variant is accepted, not binary). Use UTF-8 character encoding, not UTF-16.

Unstructured Annotation

Plain text

text/plain

text/plain

.txt

ASCII or UTF-8 character encoding required

HTML

text/html

text/html

.html

ASCII or UTF-8 character encoding required

PDF

application/pdf

application/pdf

.pdf

Embed non-standard fonts

Primary Text

Plain Text

text/plain

text/plain

.txt

ASCII or UTF-8 character encoding required

HTML

text/html

text/html

.html

ASCII or UTF-8 character encoding required

PDF

application/pdf

application/pdf

.pdf

Embed non-standard fonts

Lexicon

Toolbox Lexicon

text/x-toolbox-lexicon

text/plain

.tbx

Use UTF-8 whenever possible.

CHAT lexicon

text/x-cut

text/plain

.cut

Use UTF-8 whenever possible

Plain Text

text/plain

text/plain

.txt

ASCII or UTF-8 character encoding required

HTML

text/html

text/html

.html

ASCII or UTF-8 character encoding required

Other

Toolbox type

text/x-toolbox-type

text/plain

.typ

Toolbox type file

Toolbox language

text/x-toolbox-language

text/plain

.lng

Toolbox language file

Toolbox sort order

Text/x-toolbox-sortorder

text/plain

.set

Toolbox sort order file

XML

text/xml

text/xml

.xml

Generic XML file. Provide XML schema for non-standardised formats

Schema

text/xml

text/xml

.xsd

XML Schema file

KML

application/vnd.google-earth.kml+xml

application/vnd.google-earth.kml+xml

.kml

Google Earth KML GIS format

 

Medium-term bit-stream preservation formats

Type

Format

Assigned MIME

File Extension(s)

Comment

Binary

NeuroScan image

application/x-neuroscan-img

.img

Needs accompanying .hdr file

NeuroScan image header

application/x-neuroscan-img-hdr

.hdr

Needed for raw NeuroScan image data.

Brain Vision EEG

application/x-brainvision-data

.eeg .seg

Needs accompanying .vhdr and .vmkr files to open.

SPSS data

application/spss-sav

.sav

 

SPSS result view

application/x-spss-spv

.spv

 

NeuroScan History file

application/x-ehst

.ehst

 

NeuroScan ehtp file

application/x-ehtp

.ehtp

 

MATLAB data file

application/x-matlab-data

.mat .fig

 

DICOM file

application/dicom

.IMA .ima .dcm

 

“Text”

Brain Vision Header File

text/x-brainvision-header

.vhdr

Needed for opening raw Brain Vision EEG data

Brain Vision Marker File

text/x-brainvision-marker

.vmkr

Needed for opening raw Brain Vision EEG data

NeuroScan History info file

text/x-neuroscan-hfinf

.hfinf

 

MATLAB script

text/x-matlab

.mat

 

Presentation Script

text/x-presentation-script

.pcl .sce

Neurobehavioral Systems Presentation script

Presentation Experiment Settings

text/x-presentation-settings

.exp

Neurobehavioral Systems Presentation Experiment Settings

Praat Pitch

text/praat-pitch

.Praat .praat

Praat Pitch data text file (only plain text variant is accepted, not binary)

 

The archive accepts ZIP files for certain types of collections, but only for the purpose of packaging large numbers of files that belong to one “bundle” and are in accepted formats as specified in the tables above. ZIP files are generally not accepted for language corpora.