Adult language corpus

Anonymized demographic data of the participants, including age, educational attainment, and languages spoken.
Five groups / pairs of adults were asked to audio-record themselves on a smartphone for one hour while having their usual conversations.
The transcription rules were loosely based on the minCHAT format (MacWhinney, 2000), following the DARCLE Annotation Scheme (Casillas et al., 2017). Mentions of the names of the speakers were changed to speaker codes (e.g., FA1: female, adult, 1). Mentions of other names were changed to NAME followed by a designated number, e.g., NAME2. The ELAN files contain 3 other tiers for each speaker’s tier. These are: 1) tok containing the tokenized speaker’s utterance (automatically done on ELAN); 2) vrb (under the verb token) indicating whether a verb was intransitive, causative (transitive causative where the patient is affected based on Hopper & Thompson, 1980), non-causative (transitive non-causative), relative_intransitive (intransitive in a relative clause), relative_causative (transitive causative in a relative clause, relative_non-causative (transitive non-causative in a relative clause), and unsure; 3) lem (under the verb token) indicating the lemma of the verb.


