Data Formats of Linguistic Resources

  SFB Corpora


XML format for the representation of treebanks, collections of sentences and lexicons. Developed in project C1. See TUSNELDA documentation.

NEGRA Export

Plain text column-based format for representing treebanks. Originally developed in the NEGRA project of the Collaborative Research Center 378 (Saarland University).


XML version of NEGRA Export, developed in project A1. Additional information and a toolset for converting NEGRA Export into Export-XML and back is available here. An extension of Export-XML, Anaphora-XML, supports the representation of referential relations between the nodes in a treebank.


XML representation originally developed at Tübingen University as part of the DEREKO project. The format is similar to Export-XML, but is designed to minimize storage overhead and thus especially suitable for very large corpora. Furthermore, it supports the ambigous annotation of POS tags and morphological analyses.

Last update 03/11/2009