TüPP-D/Z
Corpus
A01: Representation and Automatic Acquisition of Linguistic Data
|
Version | 3rd September, 2004 |
TüPP-D/Z is a collection of articles from the taz newspaper ("die tageszeitung") which have been automatically annotated with clause structure, topological fields, and chunks, in addition to more low level annotation including parts of speech and morphological ambiguity classes. All texts have been processed automatically, starting from paragraph, sentence and token segmentation. Word forms include information about some regular types of named entities, including dates, telephone numbers, and number/unit combinations. |
Annotation layers | ALLLAYERSUNIFIED, Clause, Chunk, Field, Named Entities, Lexeme |
Number of associated resource files | 30350 |
|
TüBa-D/Z
Corpus
A01: Representation and Automatic Acquisition of Linguistic Data
|
Version | R3 15th August, 2006 |
The Tübinger Baumbank des Deutschen / Zeitungssprache (TüBa-D/Z; Tübingen Treebank of Written German) is a syntactically annotated, German newspaper corpus based on data taken from the daily issues of "die tageszeitung" (taz). The treebank currently comprises approximately 36000 sentences (640000 words). The annotation was performed manually and is currently ongoing. |
Annotation layers | Clause, Lexeme, Field, Discourse, Named Entities, ALLLAYERSUNIFIED, Grammatical Function, Phrase |
Number of associated resource files | 10314 |
|
Warao Lexicon
Corpus
A02: Linguistic Theories as Data Types
|
Version | 16th December, 2008 |
The Warao lexicon is a compilation of warao words collected from native speakers and bundled with information about their dialect, context, morphology and glosses and translations in English, German and Spanish. The lexicon is also linked with a list of native speakers containing some basic information about each one. |
Annotation layers | n.a. |
Number of associated resource files | 12 |
|
SINBAD
Corpus
A03: Suboptimal Syntactic Structures
|
Version | 16th December, 2008 |
The aim of the sentence collection (SINBAD) is to provide researchers with access to a large body of (suboptimal) example sentences and their grammaticality judgements from the literature and from Project A3 empirical work. |
Annotation layers | n.a. |
Number of associated resource files | 11 |
|
CoDII
Corpus
A05: Distributional Idiosyncrasies in Logical Form
|
Version | 16th December, 2008 |
The Collection of Distributionally Idiosyncratic Items (CoDII) is a linguistic resource on lexical items which have highly idiosyncratic occurrence patterns. |
Annotation layers | n.a. |
Number of associated resource files | 15 |
|
Uppsala Corpus
Corpus
B01: Corpus Based Analysis of Forms of Address and Politeness in the Slavonic Languages
|
Version | 20th July, 2000 |
The Uppsala Corpus of modern Russian texts was developed at the Department of Slavic Studies at Uppsala University, Sweden, under the direction of Lennart Lönngren, from whom we obtained the permission to use the Uppsala corpus for the SFB 441 project B01. All rights regarding the Uppsala corpus belong to the author. Corpus data may be used for research purposes only; commercial use of the corpus is prohibited. This corpus (Upsal'skij korpus russkix tekstov) consists of some 600 Russian texts with a total of one million running words (word tokens), equally divided between informative and literary prose. The informative texts are from between 1985 and 1989, while the literary texts, whose vocabulary does not date as quickly, cover a longer period, 1960-88. The corpus does not include poetry or drama. |
Annotation layers | n.a. |
Number of associated resource files | 9 |
|
Götz von Berlichingen
Corpus
B03: Modal Verbs and Modality in German
|
Version | 13th September, 2000 |
The Early Modern High German text "Götz von Berlichingen" was digitised by the SFB 441 project B3. The original text was scanned, OCR processed and manually corrected. The encoding follows the TUSNELDA standards. In order to preserve the line numbers of the source text, we used the TUSNELDA "poem" element. Pages starting with a n=0 paragraph refer to the preceding paragraphs. |
Annotation layers | n.a. |
Number of associated resource files | 5 |
|
Alltagserzählungen
Corpus
B03: Modal Verbs and Modality in German
|
Version | 1st January, 2000 |
This annotated data collection was constructed based on recordings of monologues, and their subsequent transcription. Participants were asked to talk about current events that influenced their lives in a good or bad way. The research question of this study concerns the investigation of the processing and acquisition of German coordinate structures. |
Annotation layers | n.a. |
Number of associated resource files | 9 |
|
LexiTypeDia
Corpus
B06: Lexical Motivation in French, Italian and German
|
Version | 1st January, 2000 |
Cognitive patterns about body parts, specifically, in the domain of the head. |
Annotation layers | n.a. |
Number of associated resource files | 11 |
|
Motivational Partner
Corpus
B06: Lexical Motivation in French, Italian and German
|
Version | 16th December, 2008 |
A lexicon of analyzed answers to a semi-open questionnaire designed to obtain motivational partners of word+meaning stimulus units. |
Annotation layers | n.a. |
Number of associated resource files | 13 |
|
Polysemy
Lexicon
B06: Lexical Motivation in French, Italian and German
|
Version | 16th December, 2008 |
A sentence collection constructed by conducting a sentence generation and definition task targeted at word sense disabiguation performed by the informants in a set environment. |
Annotation layers | n.a. |
Number of associated resource files | 13 |
|
Semantic Relation
Corpus
B06: Lexical Motivation in French, Italian and German
|
Version | 16th December, 2008 |
A lexicon of analyzed answers to a questionnaire targeted at specifying the semantic relations between motivated and motivating stimulus units. |
Annotation layers | n.a. |
Number of associated resource files | 13 |
|
BKS-Korpus
Super Corpus Group
B08: Corpusbased Analysis of Local and Temporal Deictics in (Spontaneously) Spoken and (Reflected) Written Language
|
Version | 14th September, 2001 |
The BKS Corpus consists of three subcorpora: (a) Comic Corpus, (b) Bosnian Interviews, (c) Novosadski Corpus of Spoken Language. The research interest of the SFB 441 project B8 lies in the use of the Bosnian/Croatian/Serbian v/t/n-deictics in different text classes. |
Annotation layers | n.a. |
Number of associated resource files | 0 |
|
Bosnische Interviews
Corpus
B08: Corpusbased Analysis of Local and Temporal Deictics in (Spontaneously) Spoken and (Reflected) Written Language
|
Version | 14th September, 2001 |
Part of the BKS-Korpus corpus group. The subcorpus Bosnian Interviews contains 13 narrative interviews which were conducted with Bosnian refugees (Croats, Muslims and Serbs) in 1994. These texts are predestined for any type of research with regard to Bosnian spoken-language-phenomenona. The research interest of our project lies in the use of the Bosnian/Croatian/Serbian v/t/n-deictics in narrative conversation-situations. |
Annotation layers | Editorial Notes, Deictics, Conversation |
Number of associated resource files | 124 |
|
Comic Korpus
Corpus
B08: Corpusbased Analysis of Local and Temporal Deictics in (Spontaneously) Spoken and (Reflected) Written Language
|
Version | 14th September, 2001 |
Part of the BKS-Korpus corpus group. The Comic Corpus consists of several serbian Asteriks comic strips. Some of the serbian comics are originally written in cyrillics; the texts were transcribed in latin script in order to have all comic texts in the same encoding. The comic texts are predestined for any type of research with regard to imitated spoken-language-phenomenons. The research interest of our project lies in the use of the Bosnian/Croatian/Serbian v/t/n-deictics in combination with a pointing gesture in a typical demonstratio ad oculos situation including all extralinguistic information given by the communication situation. For that purpose the panels that include deictics and pointing gestures were digitised and added to the corpus. |
Annotation layers | n.a. |
Number of associated resource files | 8 |
|
BraToLi-Korpus
Corpus
B09: Local and Temporal Deixis in the Romance Languages: History and Variation
|
Version | 7th June, 2000 |
The BraToLi corpus contains transcriptions of soccer match commentaries (TV and radio) as well as conversations about steeringwheel locks. Languages include Brazillian Portugese, European Spanish (Toledo) and American Spanish (Lima). |
Annotation layers | n.a. |
Number of associated resource files | 8 |
|
TüPoDia-Korpus
Corpus
B09: Local and Temporal Deixis in the Romance Languages: History and Variation
|
Version | 7th June, 2000 |
The TüPoDia corpus contains editions of Portugese texts, specifically collected on a historical basis. The texts were digitized in order to enable automatic analyses with regard to, for example, word frequencies. |
Annotation layers | Editorial Notes, Deictics, Text Strcture |
Number of associated resource files | 62 |
|
TüTeAM
Corpus
B10: Typology and Logical Form of Sentential Negation
|
Version | 7th June, 2000 |
The TüTeAM corpus contains about 2800 entries from Ancient Greek, German, English, Italian, Hungarian, Latin, Swedish, Russian, Ukrainian, Bulgarian. The data come from various sources: linguistic literature (the "classics" on tense and aspect), fiction, documentary evidence. Examples appear in the original script, if necessary with transliteration, English or German gloss and translation. The examples also contain an indication of the source or a complete denotation of the bibliographic source. Sentences are analysed according to various criteria: tense and aspect morphology, types of time adverbials, Aktionsarten. The analysis allows a specific search for similar phenomena in a variety of languages and makes the discovery of typological regularities easier. |
Annotation layers | n.a. |
Number of associated resource files | 16 |
|
TüNeg
Corpus
B10: Typology and Logical Form of Sentential Negation
|
Version | 7th June, 2000 |
The TüNeg database contains about 2700 entries from mostly the same languages as the TüTeAM database using sources similar in kind. Sentences are analysed according to the following criteria: licensing environment, different possible readings, negative polarity items involved, types of morphological negation and the general type of negation. Furthermore, where appropriate, contrasting examples have been recorded. |
Annotation layers | n.a. |
Number of associated resource files | 16 |
|
TVP (Tibetische Version Papagaienbuch)
Corpus
B11: Semantic Roles, Case Relations, and Cross-Clausal Reference in Tibetan
|
Version | 7th June, 2000 |
Semantic roles, case relations, and cross-clausal reference in Tibetan. |
Annotation layers | n.a. |
Number of associated resource files | 15 |
|
Satzkonnektoren Altspanisch
Corpus
B14: Discourse Traditions of Romance Languages and Multidimensional Analysis of Diachronic Corpora
|
Version | 7th June, 2000 |
Discourse Tradition of Romance Languages and multi-dimensional Corpus Analysis |
Annotation layers | n.a. |
Number of associated resource files | 22 |
|
Satzkonnektoren Surselvisch
Corpus
B14: Discourse Traditions of Romance Languages and Multidimensional Analysis of Diachronic Corpora
|
Version | 7th June, 2000 |
Discourse Tradition of Romance Languages and multi-dimensional Corpus Analysis. |
Annotation layers | n.a. |
Number of associated resource files | 18 |
|
Zustandpassiv
Corpus
B18: Grammar and Pragmatics of the German Stative Passive
|
Version | 16th December, 2008 |
The "Grammatik und Pragmatik des Zustandspassivs" is a sentence collection result of an investigation of the meaning of the German stative passive. |
Annotation layers | n.a. |
Number of associated resource files | 11 |
|
Gradkonstruktionen
Corpus
B17: Comparative Constructions
|
Version | 1st March, 2008 |
The database presents parallel sets of data on comparison constructions from 15 languages: Bulgarian, Guaraní (an Amerindian language spoken mostly in Paraguay), Hindi, Hungarian, Japanese, Mandarin Chinese, Mooré (a Gur language), Motu (from Papua New Guinea), Romanian, Russian, Samoan, Spanish, Thai, Turkish and Yorùbá (a Kwa language). The sentences have been elicited from naive informants with the help of language specific questionnaires. The goal has been an in-depth study of those languages, with the perspective of figuring out how their grammars differ in order to yield the diverse empirical picture that comparisons present across languages. Each language set contains at most 19 examples presented in the following order: 1) descriptive part that exemplifies the basic types of degree constructions in the given language (predicative phrasal, adverbial and attributive comparative, comparative of quantity, clausal comparative, equative, less-comparative, positive, superlative, too/enough-constructions) and gives an impression of the systematicity of degree constructions in the syntax and semantics of the language; 2) data that pertains to different aspects of cross-linguistic variation in the semantics of degree (differential comparative, comparison with a degree, ?negative island effect' test, tests for scope interactions of the comparative with the modals, degree question, measure phrase construction, subcomparative). Examples appear partly in the original script and are provided with the gloss, the translation, the grammaticality/felicity judgement and the context/reading where necessary. The judgement field contains felicity judgements for the scope interaction examples (supplied with the relevant contexts or readings) and grammaticality judgements for the rest. The following ranking has been used in both cases: ok(grammatical/felicitous); ?(slightly marked/slightly odd); ??(marked/odd); *(ungrammatical/infelicitous). "n/c" and "n/a" in the judgement field indicate that the example cannot be constructed or the test is not applicable. In the latter case, the comment field in the footer row contains a short explanation. "n/c" and "*" rows usually contain alternative examples (Alt) along with the literal ones (Lit). The former reflect alternative ways to express the relevant meaning, e.g. in the form of paraphrases. |
Annotation layers | n.a. |
Number of associated resource files | 10 |
|