Why does Stanford CoreNLP NER-annotator load 3 mod

When I add the "ner" annotator to my StanfordCoreNLP object pipeline, I can see that it loads 3 models, which takes a lot of time:

Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [10.3 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [10.1 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [6.5 sec].
Initializing JollyDayHoliday for SUTime from classpath: edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt

Is there a way to just load a subset that will work equally? Particularly, I am unsure why it is loading the 3-class and 4-class NER models when it has the 7-class model, and I'm wondering if not loading these two will still work.

You can set which models are loaded in this manner:

command line:

-ner.model model_path1,model_path2

Java code:

 props.put("ner.model", "model_path1,model_path2");

Where model_path1 and model_path2 should be something like:

"edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz"

The models are applied in layers. The first model is run and its tags applied. Then the second, the third, and so on. If you want less models, you can put 1 or 2 models in the list instead of the three default, but this will change how the system performs.

If you set "ner.combinationMode" to "HIGH_RECALL", all models will be allowed to apply all of their tags. If you set "ner.combinationMode" to "NORMAL", then a future model cannot apply any tags set by previous models.

All three models in the default were trained on different data. For instance, the 3-class was trained with substantially more data than the 7-class model. So each model is doing something different and their results are all being combined to create the final tag sequence.