I'm using the StanfordCoreNLP API interface to programmatically do some basic NLP. I need to train a model on my own corpus, but I'd like to use the StanfordCoreNLP
interface to do it, because it handles a lot of the dry mechanics behind the scenes and I don't need much specialization there.
I've trained a CRFClassifier that I'd like to use for NER, serialized to a file. Based on the documentation, I'd think the following would work, but it doesn't seem to find my model and instead barfs on not being able to find the standard models (I'm not sure why I don't have those model files, but I'm not concerned about it since I don't want to use them anyway):
// String constants
final String serializedClassifierFilename = "/absolute/path/to/model.ser.gz";
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, ner");
props.setProperty("ner.models", serializedClassifierFilename);
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
String fileContents = IOUtils.slurpFileNoExceptions("test.txt");
Annotation document = new Annotation(fileContents);
Results in:
Adding annotator tokenize
TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
Adding annotator ssplit
Adding annotator ner
Loading classifier from /path/build/edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... java.io.FileNotFoundException: edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1554)
etc., etc.
I know that I don't have their built-in model (again, not sure why.. I just cloned their git repo and compiled with ant compile
. Regardless, I don't want to use their model anyway, I want to use the one I trained).
How can I get the StanfordCoreNLP interface to use my model in the ner
step? Is possible? Is not possible?
The property name is
ner.model
, notner.models
, so your code is still trying to load the default models.Let me know if this is documented incorrectly somewhere.