If i use a query like this in command line
./opennlp TokenNameFinder en-ner-person.bin "input.txt" "output.txt"
I'll get person names printed in output.txt but I want to write own models such that i should print my own entities.
- what is the risk value on icm2500.
- Delivery of prd_234 will be arrived late.
- Watson is handling router_34.
If i pass these lines, it should parse and extract product_entities. icm2500, prd_234, router_34... etc these are all Products( we can save this information in a file and we can use it as look up kind of for models or openNLP).
Can anyone please tel me how to do this ?
You'll need to train your own model by annotating some sentences in the opennlp format. For the example sentences you posted the format would look like this:
Make sure each sentence ends in a newline and if there are newlines in the sentence to escape them somehow. Once you make a file like this out of your data, then you can use the Java API to train the model like this
now you can use the model with the namefinder.
Because you may have a definitive, and possibly short, list of product names, you might consider a simple regex approach.
here's the opennlp docs that cover the NameFinder a bit: