- a 100 million word collection of samples of written and spoken language
- speziell ausgewiesen als Datenbank für linguistische, u.a. semantische Analysen
- Sprachen: katalanisch, englisch, französisch, deutsch, irisch, gälisch, italienisch, schwedisch
- Sprecher: 5-10 pro Sprache, jeweils 10 Aufnahmen des Materials
- Inhalt: - 1.Nonsense items:
Vowels /i,a,u/ in isolation VCV sequences, where C= /p,tb,t,d,k,s,z,n,l,S,T/ and the sequences /kl,st/; V = /i,a,u/.
- 2.Real words:
These match the nonsense sequences above as closely as possible. eg. Nonsense item /iti/ is matched by the english word "meaty".
- 3.Sentences:
A set of 14 short sentences designed to illustrate the main connected speech processes in the language (eg. assimilations, weak forms ..). In some languages, items from the real word corpus appear in the sentences.
- 1.Sentences:
A set of 460 sentences designed to include the main connected speech processes in English (eg. assimilations, weak forms ..).
Orthography
Subjects: 2 speakers, 1 male and 1 female are currently available but another 38 are planned to be completed by May 2001. The subjects have a variety of accents of English.