Russian Corpora in Tübingen:

Corpus Query Reference Page

When querying the corpus, search strings have to be entered into the corpus query form. Search strings consist either of words or regular expressions. Since the corpus is searched for whole words, the search string must be congruent with the words to be found in the corpus. If only part of a word is searched for, the string .* (dot + asteriscus) can be used as truncation marker.
The search is case sensitive if the appropriate option ("Case sensitive search") has been selected.
Subsequent words are divided by a blank; this is also the case with punctuation marks, which are treated in the same way as words.
The full stop at the end of a sentence, and likewise in abbreviations (e. g. M. S., i t. d.) must be entered as "#" and the question mark must be entered as "\?", as they are special characters in regular expressions.
If the encodings "KOI-8" or "Windows-Codepage 1251" have been selected, search strings may be entered either in the internally used transliteration (cf. transliteration table), or in the selected cyrillic encoding. It is also possible to use mixed transliteration and cyrillic encoding within one query. The search strings will be transformed by an internal program You may even mix Latin and Cyrillic characters in your search before the actual search is performed.
Besides whole words, it is possible to enter regular expressions in PERL (cf. concise introduction to the use of regular expressions; extensive documentation about regular expressions in PERL.)

Examples for corpus queries on Tübingen Russian Corpora:

Simple
net doma finds: net doma
with truncation marker:
dom.* finds: (any word starting with "dom")
with punctuation:
net \? finds: net ?
net # finds: net .
i t# d# finds: i t. d.
with regular expressions:
dom(a|u|e|o[vm]|ax|ami?)? finds: dom, doma, domu, dome, domom, domov, domam, domami, domax.
(pered|za) domomfinds: pered domom, za domom
ee( net)? doma finds: ee doma, ee net doma


Back to:Russian Corpora in Tübingen
Michael Betsch
Last modified: Tue Jul 27 09:18:08 MET DST