Class CorpusMasker

java.lang.Object
  extended by CorpusMasker

public class CorpusMasker
extends java.lang.Object

A console application to mask a corpus XML file using a wide range of different methods

Version:
0.1, November 2007
Author:
Johannes Dellert

Constructor Summary
CorpusMasker()
           
 
Method Summary
static void main(java.lang.String[] args)
          This program serves to mask a corpus XML file using a wide range of different methods Usage: CorpusMasker (options) [XML-File] [output file] With the following options: --userDefPaths to let the user specify where to find the information in the trees --extendDic to specify a dictionary that is going to be used for masking --ccf to specify a character class file that will determine how replacement patterns are generated --strip to completely delete the text from the trees --x-replace to replace the text with x characters --buildDic to build a dictionary of random replace strings --buildDic-aff to build a dictionary of random replace strings with preserved morphological information [default] --preservePOS=XX,XXX,X to exclude selected POS classes from being masked --noMorph=XX,XXX,X to exclude selected POS classes from affix extraction --affixDefOccPerWord=[number between 0 and 1] to determine how greedy affix extraction will be --affixDefMinOccurs=[integer number] to determine how often an affix has to occur for being accepted --verbose for verbose output of the dictionary building process
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CorpusMasker

public CorpusMasker()
Method Detail

main

public static void main(java.lang.String[] args)
This program serves to mask a corpus XML file using a wide range of different methods Usage: CorpusMasker (options) [XML-File] [output file] With the following options: --userDefPaths to let the user specify where to find the information in the trees --extendDic to specify a dictionary that is going to be used for masking --ccf to specify a character class file that will determine how replacement patterns are generated --strip to completely delete the text from the trees --x-replace to replace the text with x characters --buildDic to build a dictionary of random replace strings --buildDic-aff to build a dictionary of random replace strings with preserved morphological information [default] --preservePOS=XX,XXX,X to exclude selected POS classes from being masked --noMorph=XX,XXX,X to exclude selected POS classes from affix extraction --affixDefOccPerWord=[number between 0 and 1] to determine how greedy affix extraction will be --affixDefMinOccurs=[integer number] to determine how often an affix has to occur for being accepted --verbose for verbose output of the dictionary building process