Data Formats of Linguistic Resources

  Homepage
  SFB Corpora
Deutsch

TUSNELDA-XML

XML format for the representation of treebanks, collections of sentences and lexicons. Developed in project C1. See TUSNELDA documentation.

NEGRA Export

Plain text column-based format for representing treebanks. Originally developed in the NEGRA project of the Collaborative Research Center 378 (Saarland University).

Export-XML

XML version of NEGRA Export, developed in project A1. Additional information and a toolset for converting NEGRA Export into Export-XML and back is available here. An extension of Export-XML, Anaphora-XML, supports the representation of referential relations between the nodes in a treebank.

DEREKO-XML

XML representation originally developed at Tübingen University as part of the DEREKO project. The format is similar to Export-XML, but is designed to minimize storage overhead and thus especially suitable for very large corpora. Furthermore, it supports the ambigous annotation of POS tags and morphological analyses.


Last update 03/11/2009