The tables below list the accepted file types and formats in The Language Archive at the MPI for Psycholinguistics. We make a distinction between formats we support for long-term preservation, including format migrations if those formats become obsolete, and formats for which we can only guarantee preservation of the files themselves (bit-stream preservation) for a minimum period of 10 years, without the guarantee that we will be able to migrate them to future formats.
The listed file extensions are obligatory for deposited files; files will be rejected if their extensions differ from what is specified in the tables.
Long-term Preservation Formats
Media resources
Type |
Format |
Assigned MIME |
Standard MIME |
Extension |
Remarks |
Audio |
WAV |
audio/x-wav |
audio/x-wav |
.wav |
16/24 bit, 44.1/48 kHz, uncompressed PCM, 1 or 2 tracks. |
Video |
MPEG1 |
video/x-mpeg1 |
video/mpeg |
.mpg |
Video stream bitrate min. 300 kbit/sec, resolution min. 352x240. Max 2 audio tracks. |
MPEG2 |
video/x-mpeg2 |
video/mpeg |
.mpeg |
MPEG2 Program Stream, video stream bitrate min. 3 mbit/sec, resolution min. 640x480. Max. 2 audio tracks. |
|
MPEG4 |
video/mp4 |
video/mp4 |
.mp4 |
Video codec: H.264, audio codec: AAC. Video stream bitrate min. 300 kbit/sec, resolution min. 352x240. Max. 2 audio tracks. |
|
Image |
JPEG |
image/jpeg |
image/jpeg |
.jpg |
|
PNG |
image/png |
image/png |
.png |
|
|
TIFF |
image/tiff |
image/tiff |
.tiff |
|
|
SVG |
image/svg+xml |
image/svg+xml |
.svg |
|
Textual resources
Type |
Format |
Assigned MIME |
Standard MIME |
Extension |
Remarks |
Structured Annotation |
EAF |
text/x-eaf+xml |
text/xml |
.eaf |
ELAN annotation file format |
PFSX |
text/x-pfsx+xml |
text/xml |
.pfsx |
ELAN settings file for a specific annotation file. Not strictly necessary to preserve but can be archived along with EAF for convenience. |
|
CHAT |
text/x-chat |
text/plain |
.cha |
CHILDES/CLAN text format. Use UTF-8 whenever possible |
|
Toolbox Text |
text/x-toolbox-text |
text/plain |
.tbt |
Use UTF-8 whenever possible. |
|
Praat TextGrid |
text/praat-textgrid |
text/praat-textgrid |
.TextGrid |
Praat TextGrid annotation file (only plain text variant is accepted, not binary). Use UTF-8 character encoding, not UTF-16. |
|
Unstructured Annotation |
Plain text |
text/plain |
text/plain |
.txt |
ASCII or UTF-8 character encoding required |
HTML |
text/html |
text/html |
.html |
ASCII or UTF-8 character encoding required |
|
|
application/pdf |
application/pdf |
|
Embed non-standard fonts |
|
Primary Text |
Plain Text |
text/plain |
text/plain |
.txt |
ASCII or UTF-8 character encoding required |
HTML |
text/html |
text/html |
.html |
ASCII or UTF-8 character encoding required |
|
|
application/pdf |
application/pdf |
|
Embed non-standard fonts |
|
ODT |
application/vnd.oasis.opendocument.text |
application/vnd.oasis.opendocument.text |
.odt |
Open Document Text |
|
Lexicon |
Toolbox Lexicon |
text/x-toolbox-lexicon |
text/plain |
.tbx |
Use UTF-8 whenever possible. |
CHAT lexicon |
text/x-cut |
text/plain |
.cut |
Use UTF-8 whenever possible |
|
Plain Text |
text/plain |
text/plain |
.txt |
ASCII or UTF-8 character encoding required |
|
HTML |
text/html |
text/html |
.html |
ASCII or UTF-8 character encoding required |
|
Other |
Toolbox type |
text/x-toolbox-type |
text/plain |
.typ |
Toolbox type file |
Toolbox language |
text/x-toolbox-language |
text/plain |
.lng |
Toolbox language file |
|
Toolbox sort order |
Text/x-toolbox-sortorder |
text/plain |
.set |
Toolbox sort order file |
|
XML |
text/xml |
text/xml |
.xml |
Generic XML file. Provide XML schema for non-standardised formats |
|
Schema |
text/xml |
text/xml |
.xsd |
XML Schema file |
|
KML |
application/vnd.google-earth.kml+xml |
application/vnd.google-earth.kml+xml |
.kml |
Google Earth KML GIS format |
|
ODS |
application/vnd.oasis.opendocument.spreadsheet |
application/vnd.oasis.opendocument.spreadsheet |
.ods |
Open Document Spreadsheet |
|
ODP |
application/vnd.oasis.opendocument.presentation |
application/vnd.oasis.opendocument.presentation |
.odp |
Open Document Presentation |
|
CSV |
text/csv |
text/csv |
.csv |
Comma Separated Values file, ASCII or UTF-8 character encoding required |
|
R script |
text/x-R |
text/x-R |
.R |
ASCII or UTF-8 character encoding required. Preserved as text, compatibility with future R versions cannot be guaranteed. |
|
R markdown |
text/x-r-markdown |
text/markdown |
.Rmd, .rmd |
ASCII or UTF-8 character encoding required. |
Medium-term bit-stream preservation formats
Type |
Format |
Assigned MIME |
File Extension(s) |
Comment |
Binary |
NeuroScan image |
application/x-neuroscan-img |
.img |
Needs accompanying .hdr file |
NeuroScan image header |
application/x-neuroscan-img-hdr |
.hdr |
Needed for raw NeuroScan image data. |
|
Brain Vision EEG |
application/x-brainvision-data |
.eeg .seg |
Needs accompanying .vhdr and .vmkr files to open. |
|
SPSS data |
application/spss-sav |
.sav |
|
|
SPSS result view |
application/x-spss-spv |
.spv |
|
|
NeuroScan History file |
application/x-ehst |
.ehst |
|
|
NeuroScan ehtp file |
application/x-ehtp |
.ehtp |
|
|
MATLAB data file |
application/x-matlab-data |
.mat .fig |
|
|
DICOM file |
application/dicom |
.IMA .ima .dcm |
|
|
BAM file |
application/x-bam |
.bam |
compressed Sequence Alignment Map, gzip compatible |
|
BAI file |
application/x-bai |
.bai |
index file for a BAM Sequence Alignment Map |
|
“Text” |
Brain Vision Header File |
text/x-brainvision-header |
.vhdr |
Needed for opening raw Brain Vision EEG data |
Brain Vision Marker File |
text/x-brainvision-marker |
.vmkr |
Needed for opening raw Brain Vision EEG data |
|
NeuroScan History info file |
text/x-neuroscan-hfinf |
.hfinf |
|
|
MATLAB script |
text/x-matlab |
.mat |
|
|
Presentation Script |
text/x-presentation-script |
.pcl .sce |
Neurobehavioral Systems Presentation script |
|
Presentation Experiment Settings |
text/x-presentation-settings |
.exp |
Neurobehavioral Systems Presentation Experiment Settings |
|
Praat Pitch |
text/praat-pitch |
.Praat .praat |
Praat Pitch data text file (only plain text variant is accepted, not binary) |
The archive accepts ZIP/GZIP files for certain types of collections, but only for the purpose of packaging large numbers of files that belong to one “bundle” and are in accepted formats as specified in the tables above. ZIP/GZIP files are generally not accepted for language corpora.