Stanford NER: Can I use two classifiers at once in

2019-03-22 05:51发布

In my code, I get the Person recognition from the first classifier, and for the second one which I made, I added some words to be recognized or annotated as Organization but it does not annotate Person.

I need to get the benefit from the two of them, how can I do that?

I'm using Netbeans, and this is the code:

String serializedClassifier = "classifiers/english.all.3class.distsim.crf.ser.gz";
String serializedClassifier2 = "/Users/ha/stanford-ner-2014-10-26/classifiers/dept-model.ser.gz";

if (args.length > 0) {
  serializedClassifier = args[0];
}

AbstractSequenceClassifier<CoreLabel> classifier = CRFClassifier.getClassifier(serializedClassifier);
AbstractSequenceClassifier<CoreLabel> classifier2 = CRFClassifier.getClassifier(serializedClassifier2);

  String fileContents = IOUtils.slurpFile("/Users/ha/NetBeansProjects/NERtry/src/nertry/input.txt");
  List<List<CoreLabel>> out = classifier.classify(fileContents);
  List<List<CoreLabel>> out2 = classifier2.classify(fileContents);

  for (List<CoreLabel> sentence : out) {
      System.out.print("\nenglish.all.3class.distsim.crf.ser.gz: ");
    for (CoreLabel word : sentence) {
      System.out.print(word.word() + '/' + word.get(CoreAnnotations.AnswerAnnotation.class) + ' ');
    }

  for (List<CoreLabel> sentence2 : out2) {
      System.out.print("\ndept-model.ser.gz");
    for (CoreLabel word2 : sentence2) {
      System.out.print(word2.word() + '/' + word2.get(CoreAnnotations.AnswerAnnotation.class) + ' ');
    }

    System.out.println();
  }
}

The problem comes from the result I get:

english.all.3class.distsim.crf.ser.gz: What/O date/O did/O James/PERSON started/O his/O job/O in/O Human/O and/O Finance/O ?/O 
dept-model.ser.gzWhat/O date/O did/O James/ORGANIZATION started/O his/O job/O in/O Human/ORGANIZATION and/O Finance/ORGANIZATION ?/O 

where it recognize the names as organization from the second classifier, and I need it to be annotated as PERSON. Any help?

2条回答
Evening l夕情丶
2楼-- · 2019-03-22 06:00

I'm not quite sure what the question here is. You already have the output of two classifiers. Perhaps this is more of a Java question, i.e. how you can iterate over both sentences at the same time:

Iterator<List<CoreLabel>> it1 = out1.iterator();
Iterator<List<CoreLabel>> it2 = out2.iterator();
while(it1.hasNext() && it2.hasNext()) {
   List<CoreLabel> sentence1 = it1.next();
   List<CoreLabel> sentence2 = it1.next();
   Iterator<CoreLabel> sentence1It = sentence1.iterator();
   Iterator<CoreLabel> sentence2It = sentence2.iterator();
   while(sentence1It.hasNext() && sentence2It.hasNext()) {
       CoreLabel word1 = sentence1It.next();
       CoreLabel word2 = sentence2It.next();
       System.out.print("\nenglish.all.3class.distsim.crf.ser.gz: ");
       System.out.print(word1.word() + '/' +
         word1.get(CoreAnnotations.AnswerAnnotation.class) + ' ');
       System.out.print("\ndept-model.ser.gz");
       System.out.print(word2.word() + '/' + 
         word2.get(CoreAnnotations.AnswerAnnotation.class) + ' ');
   }
   System.out.println();
}
查看更多
小情绪 Triste *
3楼-- · 2019-03-22 06:10

The class you should use to make this easy is NERClassifierCombiner. Its semantics is that it runs the classifiers in order from left to right as you specify them (any number can be given to it in the constructor), and that later classifiers cannot annotate an entity that overlaps with an entity tagging of an earlier classifier, but are otherwise free to add annotations. So, earlier classifiers are preferred in a simple preference ranking. I give a complete code example below.

(If you are training all your own classifiers, it is generally best to train all the entities together, so they can influence each other in the categories assigned. But this simple preference ordering usually works pretty well, and we use it ourselves.)

import edu.stanford.nlp.ie.NERClassifierCombiner;
import edu.stanford.nlp.io.IOUtils;
import edu.stanford.nlp.ling.CoreLabel;

import java.io.IOException;
import java.util.List;

public class MultipleNERs {

  public static void main(String[] args) throws IOException {
    String serializedClassifier = "classifiers/english.all.3class.distsim.crf.ser.gz";
    String serializedClassifier2 = "classifiers/english.muc.7class.distsim.crf.ser.gz";

    if (args.length > 0) {
      serializedClassifier = args[0];
    }

    NERClassifierCombiner classifier = new NERClassifierCombiner(false, false, 
            serializedClassifier, serializedClassifier2);

    String fileContents = IOUtils.slurpFile("input.txt");
    List<List<CoreLabel>> out = classifier.classify(fileContents);

    int i = 0;
    for (List<CoreLabel> lcl : out) {
      i++;
      int j = 0;
      for (CoreLabel cl : lcl) {
        j++;
        System.out.printf("%d:%d: %s%n", i, j,
                cl.toShorterString("Text", "CharacterOffsetBegin", "CharacterOffsetEnd", "NamedEntityTag"));
      }
    }
  }

}
查看更多
登录 后发表回答