Project C2:
Sustainability of Linguistic Data

Joint project between the SFBs 441, 538 and 632

  SFB 441
  SFB 538
  SFB 632
Deutsch

Head of the project

Prof. Dr. Marga Reis
Deutsches Seminar
Universität Tübingen
Wilhelmstraße 50
72074 Tübingen
Tel. +49/7071/29-76741; Fax +49/7071/29-5321
email: mer <[at]> uni-tuebingen.de
 
Prof. Dr. Erhard Hinrichs
Seminar für Sprachwissenschaft
Universität Tübingen
Wilhelmstr. 19
72074 Tübingen
Tel. +49/7071/29-75446
Fax. +49/7071/29-5214
 

Staff

SFB 441 – Tübingen University

Andreas Witt
Office: Nauklerstr. 35, D-72074 Tübingen, Room 2.06
Phone +49/7071/29-77155
Fax +49/7071/29-5830
Mail: andreas.witt <[at]> uni-bielefeld.de

Georg Rehm
Office: Nauklerstr. 35, D-72074 Tübingen, Room 2.06
Phone: +49/7071/29-77155
Mail: georg.rehm <[at]> uni-tuebingen.de

Dipl.-Inform. Oliver Schonefeld
Büro: Nauklerstr. 35, D-72074 Tübingen, Raum 2.06
Tel. +49/7071/29-77155
Fax +49/7071/29-5830
Email: Oliver.Schonefeld <[at]> uni-tuebingen.de

SFB 538 – Hamburg University

Timm Lehmberg
Office: Max-Brauer-Allee 60, D - 22765 Hamburg, Room 116
Phone: +49/7071/29-77155
Fax: +49/7071/29-5830
Mail: Timm.Lehmberg <[at]> uni-hamburg.de

SFB 632 – Potsdam University

Christian Chiarcos
Office: Potsdam University — Campus GOLM, Karl-Liebknecht-Str. 24-25, D-14476 Golm

Phone:+ 49 331-977-2664
Mail: chiarcos<[at]>uni-potsdam.de

Summary

The project C2 aims at preparing language resources to assure an accessible dissemination and sustainable storage of linguistic corpora. One of the main goals of the project is a practical one: resources acquired in long-term projects situated in three Collaborative Research Centres have to be converted in either one or multiple formats to be sustainably usable by researchers and applications. Furthermore, the project will provide unified methods of access for the heterogeneous data acquired in the projects. In addition to the preparation of existing language corpora, general methodologies and rules of best practice will be developed.

The linguistic resources dealt by C2 are highly heterogeneous:

  • the primary data itself is heterogeneous:
    • size (e.g., single sentences vs. entire articles),
    • text types / data types (e.g. newspaper texts, diachronic texts, dialogues, treebanks, ...)
    • modality (monologue vs. dialogue),
    • categories of information covered by the annotation / annotation levels (e.g. layout, textual structure, morpho-syntax, syntax, ...)
    • underlying linguistic theories
    • language
  • the annotations require data structures of various types (attribute-value pairs, trees, pointers, etc.)
  • data is annotated by means of different, task-specific annotation tools

The Collaborative Research Centres involved in the project are the SFB 538 'Multilingualism' at the University of Hamburg, the SFB 632 'Information Structure' at the University of Potsdam and the Humboldt University Berlin, and the SFB 441 'Linguistic Data Structures' at the Eberhard Karls University Tüubingen.

Corpora

Publications


Last modified 15 March 2009