Load Custom NER Model Stanford CoreNLP

2020-02-26 12:23发布

问题:

I have created my own NER model with Stanford's "Stanford-NER" software and by following these directions.

I am aware that CoreNLP loads three NER models out of the box in the following order:

  1. edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz
  2. edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz
  3. edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz

I now want to include my NER model in the list above and have the text tagged by my NER model first.

I have found two previous StackOverflow questions regarding this topic and they are 'Stanford OpenIE using customized NER model' and 'Why does Stanford CoreNLP NER-annotator load 3 models by default?'

Both of these posts have good answers. The general message of the answers is that you have to edit code within a file.

Stanford OpenIE using customized NER model

From this post it says to edit corenlpserver.sh but I cannot find this file within the Stanford CoreNLP downloaded software. Can anyone point me to this file's location?

does Stanford CoreNLP NER-annotator load 3 models by default?

This post says that I can use the argument of -ner.model to specifically call which NER models to load. I added this argument to the initial server command (java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000 -ner.model *modlefilepathhere*). This did not work as the server still loaded all three models.

It also states that you have to change some java code though it does not specifically call out where to make the change.

Do I need to modify or add this code props.put("ner.model", "model_path1,model_path2"); to a specific class file in the CoreNLP software?

QUESTION: From my research it seems that I need to add/modify some code to call my unique NER model. These 'edits' are outlined above and this information has been pulled from other StackOverflow questions. What files specifically do I need to edit? Where exactly are these files located (i.e. edu/Stanford/nlp/...etc)?

EDIT: My system is running on a local server and I'm using the API pycorenlp in order to open a pipeline to my local server and to make requests against it. the two critical lines of python/pycorenlp code are:

  1. nlp = StanfordCoreNLP('http://localhost:9000')
  2. output = nlp.annotate(evalList[line], properties={'annotators': 'ner, openie','outputFormat': 'json', 'openie.triple.strict':'True', 'openie.max_entailments_per_clause':'1'})

I do NOT think this will affect my ability to call my unique NER model but I wanted to present all the situational data I can in order to obtain the best possible answer.

回答1:

If you want to customize the pipeline the server uses, create a file called server.properties (or you can call it whatever you want).

Then add this option when you start the server -serverProperties server.properties with the java command.

In that .properties file you should include ner.model = /path/to/custom_model.ser.gz

In general you can customize the pipeline the server will use in that .properties file. For instance you can also set the list of annotators in it with the line annotators = tokenize,ssplit,pos,lemma,ner,parse etc...

UPDATE to address comments:

  1. In your java command you don't need the -ner.model /path/to/custom_model.ser.gz

  2. A .properties file can have an unlimited amount of properties settings in it, one setting per line (blank lines are ignored, as are #'d out lines)

  3. When you run a Java command, it default looks for files in the directory you are running the command. So if your command includes -serverProperties server.properties it is going to assume that the file server.properties is in the same directory the command is running from. If you supply an absolute path instead -serverProperties /path/to/server.properties you can run the command from anywhere.

  4. So just to be clear you could start the server with this command (run in the folder with all the jars):

java -Xmx8g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000 -serverProperties server.properties

and server.properties should be a file like this:

ner.model = /path/to/custom_model.ser.gz

server.properties could look like this:

annotators = tokenize,ssplit,pos,lemma,ner,depparse
ner.model = /path/to/custom_model.ser.gz
parse.maxlen = 100

just as an example...you should put all settings into server.properties

  1. I made some comments about accessing the StanfordCoreNLP server from Python in a previous answer:

cannot use pycorenlp for python3.5 through terminal

You appear to be using the pycorenlp library which I don't really know about. 2 other options are some code I show in that answer or the stanza package we make. Details in that answer above.