ConllReader (Like RothCONLL04Reader) throws except

2019-09-16 10:06发布

问题:

In continuation of the following question. How to generate custom training data for Stanford relation extraction

Thanks to StanfordNLPHelp i am able to generate relation data with custom ner and on top of it regexner.

I had to run my custom model at the end because otherwise it will misclassify lots of ORGANIZATION PERSON etc. 
Example custom NER classes. 

"DEGREE", "DESG"

Example of relation training data.

0   ELECTEDBODY 0   O   NNP/IN/NNP  BOARD/OF/DIRECTORS  O   O   O
0   ORGANIZATION    1   O   NNP Board   O   O   O
0   O   2   O   NNS committees  O   O   O
0   O   3   O   JJ  key O   O   O
0   ORGANIZATION    4   O   NN/NN/NN/NN/NNP/NN  N/Nomination/committee/A/Audit/committee    O   O   O
0   O   5   O   NN  R   O   O   O
0   MISC    6   O   NN  Remuneration    O   O   O
0   O   7   O   NN  committee   O   O   O
0   O   8   O   NNP EFFECTIVE   O   O   O
0   O   9   O   NNP LEADERSHIP  O   O   O
0   O   10  O   CC  AND O   O   O
0   O   11  O   JJ  STRONG  O   O   O
0   O   12  O   NN  GOVERNANCE  O   O   O
0   O   13  O   NNP George  O   O   O
0   O   14  O   NNP Weston  O   O   O
0   DESG    15  O   NNP/NNP Chief/Executive O   O   O
0   O   16  O   -LRB-   -LRB-   O   O   O
0   O   17  O   NN  age O   O   O
0   NUMBER  18  O   CD  52  O   O   O
0   O   19  O   -RRB-   -RRB-   O   O   O
0   PERSON  20  O   NNP George  O   O   O
0   O   21  O   VBD was O   O   O
0   O   22  O   VBN appointed   O   O   O
0   O   23  O   TO  to  O   O   O
0   O   24  O   DT  the O   O   O
0   ELECTEDBODY 25  O   NN  board   O   O   O
0   DATE    26  O   IN/CD   in/1999 O   O   O
0   O   27  O   CC  and O   O   O
0   O   28  O   VBD took    O   O   O
0   O   29  O   RP  up  O   O   O
0   O   30  O   PRP$    his O   O   O
0   O   31  O   JJ  current O   O   O
0   O   32  O   NN  appointment O   O   O
0   O   33  O   IN  as  O   O   O
0   DESG    34  O   NNP/NNP Chief/Executive O   O   O
0   O   35  O   IN  in  O   O   O
0   DATE    36  O   NNP/CD  April/2005  O   O   O
0   O   37  O   .   .   O   O   O

20  34  cur_desg 
20  36  cur_desg_from

I am trying to train custom relation model and added my custom relation classes.

ex: relation class -> **cur_desg** (current designation) between entities (**PERSON, DESG**)
**Here is the relevant section of my properties file to train the relation classifier.**

datasetReaderClass = com.samrat.nlp.ie.re.CustomConllReader
entityClassifier = com.samrat.nlp.ie.re.CustomConllExtractor
relationResultsPrinters = com.samrat.nlp.ie.re.RelationResultPrinter

serializedTrainingSentencesPath = custom_relation_sentences.ser
serializedEntityExtractorPath = custom_relation_model.ser
serializedRelationExtractorPath = custom-relation-model-pipeline.ser

Relevant section of Code CustomConllReader

private String getNormalizedNERTag(String ner) {
        ......
        }  else if(ner.equalsIgnoreCase("degree")) {
            return "DEGREE";
        }
        else if(ner.equalsIgnoreCase("electedbody")) {
            return "ELECTEDBODY";
        }
...............

Problem 1 (CustomConllReader throws exception at following line while reading training data)

Span span = new Span(entity1.getExtentTokenStart(), entity2.getExtentTokenEnd());

Relevant portion of CustomConllReader (It is almost same as RothCONLL04Reader)

case 3: // relation
                System.out.println(currentLine);
                String type = pieces.get(2);
                List<ExtractionObject> args = new ArrayList<>();
                EntityMention entity1 = indexToEntityMention.get(pieces.get(0));
                EntityMention entity2 = indexToEntityMention.get(pieces.get(1));
                args.add(entity1);
                args.add(entity2);
                Span span = new Span(entity1.getExtentTokenStart(), entity2.getExtentTokenEnd());
                // identifier = "relation" + sentenceID + "-" + sentence.getAllRelations().size();
                identifier = RelationMention.makeUniqueId();
                RelationMention relationMention = new RelationMention(identifier,
                        sentence, span, type, null, args);
                AnnotationUtils.addRelationMention(sentence, relationMention);
                break;

Exception

    INFO: Reading file: tagged-training-relation-data-conll04.corp
20  34  cur_desg 
20  36  cur_desg_from
0   2   cur_desg
Exception in thread "main" java.io.IOException
    at edu.stanford.nlp.ie.machinereading.GenericDataSetReader.parse(GenericDataSetReader.java:138)
    at com.wipro.nlp.ie.re.CustomConllReader.main(CustomConllReader.java:292)
Caused by: java.lang.NullPointerException
    at com.wipro.nlp.ie.re.CustomConllReader.readSentence(CustomConllReader.java:144)
    at com.wipro.nlp.ie.re.CustomConllReader.read(CustomConllReader.java:55)
    at edu.stanford.nlp.ie.machinereading.GenericDataSetReader.parse(GenericDataSetReader.java:136)
    ... 1 more

The exception thrown on sentence 3 while parsing the relation (0 2 cur_desg)

3   PERSON  0   O   NNP/NNP John/Bason  O   O   O
3   O   1   O   NNP Finance O   O   O
3   ELECTEDBODY 2   O   NNP Director    O   O   O
3   O   3   O   -LRB-   -LRB-   O   O   O
3   O   4   O   NN  age O   O   O
3   NUMBER  5   O   CD  59  O   O   O
3   O   6   O   -RRB-   -RRB-   O   O   O
3   PERSON  7   O   NNP John    O   O   O
3   O   8   O   VBD was O   O   O
3   O   9   O   VBN appointed   O   O   O
3   O   10  O   IN  as  O   O   O
3   O   11  O   NNP Finance O   O   O
3   ELECTEDBODY 12  O   NNP Director    O   O   O
3   O   13  O   IN  in  O   O   O
3   DATE    14  O   NNP/CD  May/1999    O   O   O
3   O   15  O   .   .   O   O   O

0   2   cur_desg
0   14  cur_desg_from

This problem is solved, my training data has extra line break in between i am able to build a custom relation classifier. But now while using that custom relation classifier it does not understand any custom NER tags or custom relations.

Separate question here below. (for making custom relation classifier understand custom ner tags and relations in new sentences) Custom Relation Classifier does not understand any Custom NER tags and does not find any relations

回答1:

The exception was thrown due to extra line break in between. There has to be exactly two line breaks in the input tagged training data like below.

PERSON  0   O   NNP/NNP John/Bason  O   O   O
3   O   1   O   NNP Finance O   O   O
3   ELECTEDBODY 2   O   NNP Director    O   O   O
3   O   3   O   -LRB-   -LRB-   O   O   O
3   O   4   O   NN  age O   O   O
3   NUMBER  5   O   CD  59  O   O   O
3   O   6   O   -RRB-   -RRB-   O   O   O
3   PERSON  7   O   NNP John    O   O   O
3   O   8   O   VBD was O   O   O
3   O   9   O   VBN appointed   O   O   O
3   O   10  O   IN  as  O   O   O
3   O   11  O   NNP Finance O   O   O
3   ELECTEDBODY 12  O   NNP Director    O   O   O
3   O   13  O   IN  in  O   O   O
3   DATE    14  O   NNP/CD  May/1999    O   O   O
3   O   15  O   .   .   O   O   O

0   2   cur_desg
0   14  cur_desg_from

5   O   0   O   PRP He  O   O   O
5   O   1   O   VBD was O   O   O
5   O   2   O   RB  previously  O   O   O
5   O   3   O   DT  the O   O   O
5   O   4   O   NN  finance O   O   O
5   DESG    5   O   NN  director    O   O   O
5   O   6   O   IN  of  O   O   O
5   ORGANIZATION    7   O   NNP Bunzl   O   O   O
5   O   8   O   NN  plc O   O   O
5   O   9   O   CC  and O   O   O
5   O   10  O   VBZ is  O   O   O