Stanford CRFClassifier performance evaluation outp

2019-08-27 04:13发布

问题:

I'm following this FAQ https://nlp.stanford.edu/software/crf-faq.shtml for training my own classifier and I noticed that the performance evaluation output does not match the results (or at least not in the way I expect). Specifically this section

CRFClassifier tagged 16119 words in 1 documents at 13824.19 words per second. Entity P R F1 TP FP FN MYLABEL 1.0000 0.9961 0.9980 255 0 1 Totals 1.0000 0.9961 0.9980 255 0 1

I expect TP to be all instances where the predicted label matched the golden label, FP to be all instances where MYLABEL was predicted but the golden label was O, FN to be all instances where O was predicted but the golden was MYLABEL.

If I calculate those numbers myself from the output of the program, I get completely different numbers with no relation to what the program prints. I've tried this with various test files. I'm using Stanford NER - v3.7.0 - 2016-10-31

Am I missing something?

回答1:

The F1 scores are over entities not labels.

Example:

(Joe, PERSON) (Smith, PERSON) (went, O) (to, O) (Hawaii, LOCATION) (., O).

In this example there are two possible entities:

Joe Smith   PERSON
Hawaii      LOCATION

Entities are created by taking all adjacent tokens with the same label. (Unless you use a more complicated BIO labeling scheme ; BIO schemes have tags like I-PERSON and B-PERSON to indicate whether a token is the beginning of an entity, etc...).