i have tagged a simple sentence and this is my code:
package tagger;
import edu.stanford.nlp.tagger.maxent.MaxentTagger;
public class myTag {
public static void main(String[] args) {
MaxentTagger tagger = new MaxentTagger("D:/tagger/english-bidirectional-distsim.tagger");
String sample = "i go to school by bus";
String tagged = tagger.tagString(sample);
System.out.println(tagged);
}
}
this is the output:
Reading POS tagger model from D:/tagger/english-bidirectional-distsim.tagger ... done [3.0 sec].
i_LS go_VB to_TO school_NN by_IN bus_NN
after editing the properties file it doesn't have any effect at all.
for example i have changed the tag separator to ( * ) but in the output it still prints ( _ ).
how could i use the model config file in eclipse?
You can load Properties file and pass it to the constructor of MaxEnt, something like this:
Properties props = new Properties();
props.load(new FileReader("path/to/properties"));
MaxentTagger tagger = new MaxentTagger("D:/tagger/english-bidirectional-distsim.tagger", props);
You can also set properties in props
object directly:
props.setProperty("tagSeparator", "*");
NB: if you use the original properties file and it fails with exception like
java.io.FileNotFoundException: /u/nl
p/data/pos_tags_are_useless/egw4-reut.512.clusters (No such file or directory)
then remove arch
and trainFile
attributes.
Instead of writing a java code for this, you can use the bash file which comes in the downloaded ZIP file.
After extracting the postagger's ZIP file, edit the following bash file:
stanford-postagger.sh
It should have the following line:
java -mx300m -cp 'stanford-postagger.jar:lib/*' edu.stanford.nlp.tagger.maxent.MaxentTagger -model $1 -textFile $2
Add a parameter called "-tagSeparator [YourTag]" after "-model $1":
java -mx300m -cp 'stanford-postagger.jar:lib/*' edu.stanford.nlp.tagger.maxent.MaxentTagger -model $1 -tagSeparator * -textFile $2
To run it (Make sure necessary permissions are given):
./stanford-postagger.sh models/model_name.tagger in_filename > out_filename
Voilà!