This SAX parser reads through the whole document once,
reading all the orth elements together with their POS tags and storing
the replacements it generates in a ConversionDictionary
A DictExtractor needs specification of the content to be masked and a definition of untouchable content as well as a file defining the char classes for replacement