Dates when using StanfordCoreNLP pipeline

2019-06-09 07:33发布

If I create an AnnotationPipeline with a TokenizerAnnotator, WordsToSentencesAnnotator, POSTaggerAnnotator, and sutime, I get TimexAnnotations attached to the resulting annotation.

But if I create a StanfordCoreNLP pipeline with the "annotators" property set to "tokenize, ssplit, pos, lemma, ner", I don't get TimexAnnotations even though the relevant individual tokens are NER-tagged as DATE.

Why is there this difference?

2条回答
劫难
2楼-- · 2019-06-09 08:27

When we run annotations, we extract all entity mentions from the document and we consider a DATE to be an entity mention. Here is some sample code. I've add some commented out options if you just want to extract time expressions and you want that TimexAnnotations.class field to be populated.

package edu.stanford.nlp.examples;

import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.util.*;
import edu.stanford.nlp.time.TimeAnnotations;

import edu.stanford.nlp.pipeline.*;

import java.util.*;

public class SUTimeExample {

  public static void main(String[] args) {
    Annotation document =
        new Annotation("The date is 1 April 2017");
    Properties props = new Properties();
    //props.setProperty("customAnnotatorClass.time", "edu.stanford.nlp.time.TimeAnnotator");
    //props.setProperty("annotators", "tokenize,ssplit,pos,lemma,time");
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,entitymentions");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    pipeline.annotate(document);
    for (CoreMap entityMention : document.get(CoreAnnotations.MentionsAnnotation.class)) {
      if (entityMention.get(CoreAnnotations.EntityTypeAnnotation.class).equals("DATE"))
        System.out.println(entityMention);
    }
  }
}
查看更多
劫难
3楼-- · 2019-06-09 08:29

When I run this command:

java -Xmx8g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner -file data-example.txt -outputFormat text

I get TIMEX annotations for the DATE. The ner annotator should be applying SUTime by default.

查看更多
登录 后发表回答