If i use a query like this in command line
./opennlp TokenNameFinder en-ner-person.bin "input.txt" "output.txt"
I'll get person names printed in output.txt but I want to write own models such that i should print my own entities.
E.g.
- what is the risk value on icm2500.
- Delivery of prd_234 will be arrived late.
- Watson is handling router_34.
If i pass these lines, it should parse and extract product_entities. icm2500, prd_234, router_34... etc these are all Products( we can save this information in a file and we can use it as look up kind of for models or openNLP).
Can anyone please tel me how to do this ?
You'll need to train your own model by annotating some sentences in the opennlp format. For the example sentences you posted the format would look like this:
what is the risk value on <START:product> icm2500 <END>.
Delivery of <START:product> prd_234 <END> will be arrived late.
Watson is handling <START:product> router_34 <END>.
Make sure each sentence ends in a newline and if there are newlines in the sentence to escape them somehow.
Once you make a file like this out of your data, then you can use the Java API to train the model like this
public static void main(String[] args){
Charset charset = Charset.forName("UTF-8");
ObjectStream<String> lineStream =
new PlainTextByLineStream(new FileInputStream("your file in the above format"), charset);
ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream);
TokenNameFinderModel model;
try {
model = NameFinderME.train("en", "person", sampleStream, TrainingParameters.defaultParams(),
null, Collections.<String, Object>emptyMap());
}
finally {
sampleStream.close();
}
try {
modelOut = new BufferedOutputStream(new FileOutputStream(modelFile));
model.serialize(modelOut);
} finally {
if (modelOut != null)
modelOut.close();
}
}
now you can use the model with the namefinder.
Because you may have a definitive, and possibly short, list of product names, you might consider a simple regex approach.
here's the opennlp docs that cover the NameFinder a bit:
http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind.training.tool