Publications of the SFB 441-Project C2 - Sustainability of Linguistic Data

  SFB 441
  Publications of SFB 441
  C2

  • Stefanie Dipper, Erhard Hinrichs, Thomas Schmidt, Andreas Wagner, Andreas Witt. (2006) Sustainability of Linguistic Resources. In: Erhard Hinrichs, Nancy Ide, Martha Palmer, and James Pustejovsky (eds.): Proceedings of the LREC 2006 Satellite Workshop on "Merging and Layering Linguistic Information", Genoa 2006.
  • Thomas Schmidt, Christian Chiarcos, Timm Lehmberg, Georg Rehm, Andreas Witt, and Erhard Hinrichs. (2006) Avoiding Data Graveyards: From Heterogeneous Data Collected in Multiple Research Projects to Sustainable Linguistic Resources. Proceedings of the E-MELD workshop 2006, Michigan State University in East Lansing, Michigan.
  • Kai Wörner, Andreas Witt, Georg Rehm, Stefanie Dipper Modelling Linguistic Data Structures. Proceedings of the Extreme Markup Languages 2006, Montréal, Canada.
  • Georg Rehm, Andreas Witt, Lothar Lemnitzer. (eds.,2007) Datenstrukturen für linguistische Ressourcen und ihre Anwendungen - Data Structures for Linguistic Resources and Applications. Proceedings of the Biennial GLDV Conference 2007. Narr, Tübingen 2007.
  • Timm Lehmberg, Christian Chiarcos, Georg Rehm, Andreas Witt. (2007) Rechtsfragen bei der Nutzung und Weitergabe linguistischer Daten. In: Datenstrukturen für linguistische Ressourcen und ihre Anwendungen - Data Structures for Linguistic Resources and Applications. Proceedings of the Biennial GLDV Conference 2007, Georg Rehm, Andreas Witt, Lothar Lemnitzer (eds.), Narr, Tübingen 2007
  • Georg Rehm, Andreas Witt. (2007) Digital Text Resources for the Humanities – Legal Issues Digital Humanities 2007 – Session Description. In: Proceedings of Digital Humanities 2007, University of Illinois, Urbana-Champaign, USA, 2007 (slides)
  • Georg Rehm, Andreas Witt, Heike Zinsmeister, Johannes Dellert. (2007) Corpus Masking: Legally Bypassing Licensing Restrictions for the Free Distribution of Text Collections. In: Proceedings of Digital Humanities 2007, University of Illinois, Urbana-Champaign, USA, 2007 (slides; the tool we presented in this talk, CorpusMasker, is now ready for download (see Software section below))
  • Timm Lehmberg, Christian Chiarcos, Erhard Hinrichs, Georg Rehm, Andreas Witt. (2007) Collecting Legally Relevant Metadata by Means of a Decision-Tree-Based Questionnaire System. In: Proceedings of Digital Humanities 2007, University of Illinois, Urbana-Champaign, USA, 2007 (slides)
  • Andreas Witt, Oliver Schonefeld, Georg Rehm, Jonathan Khoo, Kilian Evang. (2007) On the Lossless Transformation of Single-File, Multi-Layer Annotations into Multi-Rooted Trees. In: Extreme Markup Languages, August 7-10, Montréal, Canada, 2007
  • Georg Rehm, Richard Eckart, Christian Chiarcos. (2007) An OWL- and XQuery-Based Mechanism for the Retrieval of Linguistic Patterns from XML-Corpora. In: RANLP 2007: Recent Advances in Natural Language Processing, September 27-29, Borovets, Bulgaria, 2007
  • Andreas Witt, Georg Rehm, Timm Lehmberg, Erhard Hinrichs. (2007) Mapping Multi-Rooted Trees from a Sustainable Exchange Format to TEI Feature Structures. In: TEI@20: 20 Years of Supporting the Digital Humanities. The 20th Anniversary Text Encoding Initiative Consortium Members' Meeting. October 31-November 3, University of Maryland, College Park, USA, 2007
  • Georg Rehm, Andreas Witt, Heike Zinsmeister, Johannes Dellert. (2007) Masking Treebanks for the Free Distribution of Linguistic Resources and Other Applications. In: Proceedings of the Sixth International Workshop on Treebanks and Linguistic Theories (TLT 2007). December 7-8, Bergen, Norway, 2007, p. 127 – 138, WWWW: http://tlt07.uib.no/papers/6.pdf
  • Andreas Witt, Elke Teich, Felix Sasaki, Peter Wittenburg, Nicoletta Calzolari (2008, eds.). Proceedings of the LREC 2008 Workshop Uses and usage of language resource-related standards, May 27, Marrakech, Morocco, 2008. WWW: http://www.lrec-conf.org/proceedings/lrec2008/workshops/W7_Proceedings.pdf
  • Andreas Witt, Georg Rehm, Thomas Schmidt, Khalid Choukri, Lou Burnard (2008, eds.). Proceedings of the LREC 2008 Workshop Sustainability of Language Resources and Tools for Natural Language Processing, May 31, Marrakech, Morocco, 2008. WWW: http://www.lrec-conf.org/proceedings/lrec2008/workshops/W17_Proceedings.pdf
  • Jan-Philipp Soehn, Heike Zinsmeister, Georg Rehm (2008). Requirements of a User-Friendly, General-Purpose Corpus Query Interface. In: Witt, Rehm, Schmidt, Choukri, Burnard (eds.) Proceedings of the LREC 2008 Workshop Sustainability of Language Resources and Tools for Natural Language Processing, May 31, Marrakech, Morocco, 2008. pp. 27 – 32 WWW: http://www.lrec-conf.org/proceedings/lrec2008/workshops/W17_Proceedings.pdf
  • Georg Rehm, Richard Eckart, Christian Chiarcos, Johannes Dellert (2008). Ontology-Based XQuery'ing of XML-Encoded Language Resources on Multiple Annotation Layer. In: Proceedings of LREC 2008, May 28 – 30, Marrakech, Morocco, 2008. WWW: http://www.lrec-conf.org/proceedings/lrec2008/summaries/139.html
  • Georg Rehm, Oliver Schonefeld, Andreas Witt, Timm Lehmberg, Christian Chiarcos, Hanan Bechara, Florian Eishold, Kilian Evang, Magdalena Leshtanska, Aleksandar Savkov, Matthias Stark. (2008). The Metadata-Database of a Next Generation Sustainability Web-Platform for Language Resources. In: Proceedings of LREC 2008, May 28 &ndas; 30, Marrakech, Morocco, 2008. WWW: http://www.lrec-conf.org/proceedings/lrec2008/summaries/97.html
  • Georg Rehm, Andreas Witt. Aspects of Sustainability in Digital Humanities. Session Description. In: Proceedings of Digital Humanities 2008, June 25 – 29, Oulu, Finland, 2008. pp. 21 – 22 WWW: http://www.ekl.oulu.fi/dh2008/Digital%20Humanities%202008%20Book%20of%20Abstracts.pdf
  • Georg Rehm, Andreas Witt, Erhard Hinrichs, Marga Reis. Sustainability of Annotated Resources in Linguistics. In: Proceedings of Digital Humanities 2008, June 25 – 29, Oulu, Finland, 2008. S. 27 – 29 WWW: http://www.ekl.oulu.fi/dh2008/Digital%20Humanities%202008%20Book%20of%20Abstracts.pdf
  • Timm Lehmberg, Georg Rehm, Andreas Witt. Sustainability of Richly Annotated Linguistic Corpora. In: Languages in Contrast: Grammar, Translation, Corpora. 41st Annual Meeting of the Societas Linguistica Europaea, September 17-20, University of Bologna, Forli, Italy, 2008. In print.
  • Georg Rehm, Oliver Schonefeld, Andreas Witt, Christian Chiarcos, Timm Lehmberg. A Web-Platform for Preserving, Exploring, Visualising and Querying Linguistic Corpora and other Resources. In: SEPLN 2008 – 24th Edition of the Conference of the Spanish Society for Natural Language Processing, September 10–12, Madrid, Spain, 2008. In print.
  • Georg Rehm, Oliver Schonefeld, Andreas Witt, Christian Chiarcos, Timm Lehmberg. SPLICR: A Sustainability Platform for Linguistic Corpora and Resources. In: Konferenz zur Verarbeitung natürlicher Sprache (KONVENS 2008), September 30–October 02, Berlin, Germany, 2008. In print.
  • Digital Text Collections, Linguistic Research Data, and Mashups: Notes on the Legal Situation. Timm Lehmberg, Georg Rehm, Andreas Witt, Felix Zimmermann. Library Trends, 57 (1), special issue Digital Books and the Impact on Libraries, 2008.
  • Heike Zinsmeister, Erhard Hinrichs, Sandra Kübler, Andreas Witt. Linguistically Annotated Corpora: Quality Assurance, Reusablity and Sustainability. In: A. Lüdeling and M. Kytö (Eds.) Corpus Linguistics. An International Handbook. Mouton de Gruyter, Berlin, 2008
  • Andreas Witt, Georg Rehm, Erhard Hinrichs, Timm Lehmberg, Jens Stegmann. SusTEInability of Linguistic Resources through Feature Structures. Literary and Linguistic Computing, 2009.
  • Georg Rehm, Oliver Schonefeld, Andreas Witt, Erhard Hinrichs, Marga Reis. Sustainability of Annotated Resources in Linguistics: A Web-Platform for Exploring, Querying and Distributing. Literary and Linguistic Computing, 2009.

Technical Reports

  • Georg Rehm. Spezifikation: Staging Area und Manifest-Datei. Interner technischer Bericht, SFB 441, 2008.
  • Oliver Schonefeld. Specification of the system database and related components, Internal technical Report, SFB 441, 2008.
  • Matthias Stark, Georg Rehm. Configuration of the XML Editor Oxygen for Editing eTEI Metadata Records. Interner technischer Bericht, SFB 441, 2008.


Software

  • CorpusMasker - a software tool implemented in Java for parametrized masking of linguistic resources.
    A first prototype (version 0.1) is ready to download here.


Last modified Dec 19, 2008 by Andreas Witt