In continuation of the following question. How to generate custom training data for Stanford relation extraction
Thanks to StanfordNLPHelp i am able to generate relation data with custom ner and on top of it regexner.
I had to run my custom model at the end because otherwise it will misclassify lots of ORGANIZATION PERSON etc.
Example custom NER classes.
"DEGREE", "DESG"
Example of relation training data.
0 ELECTEDBODY 0 O NNP/IN/NNP BOARD/OF/DIRECTORS O O O
0 ORGANIZATION 1 O NNP Board O O O
0 O 2 O NNS committees O O O
0 O 3 O JJ key O O O
0 ORGANIZATION 4 O NN/NN/NN/NN/NNP/NN N/Nomination/committee/A/Audit/committee O O O
0 O 5 O NN R O O O
0 MISC 6 O NN Remuneration O O O
0 O 7 O NN committee O O O
0 O 8 O NNP EFFECTIVE O O O
0 O 9 O NNP LEADERSHIP O O O
0 O 10 O CC AND O O O
0 O 11 O JJ STRONG O O O
0 O 12 O NN GOVERNANCE O O O
0 O 13 O NNP George O O O
0 O 14 O NNP Weston O O O
0 DESG 15 O NNP/NNP Chief/Executive O O O
0 O 16 O -LRB- -LRB- O O O
0 O 17 O NN age O O O
0 NUMBER 18 O CD 52 O O O
0 O 19 O -RRB- -RRB- O O O
0 PERSON 20 O NNP George O O O
0 O 21 O VBD was O O O
0 O 22 O VBN appointed O O O
0 O 23 O TO to O O O
0 O 24 O DT the O O O
0 ELECTEDBODY 25 O NN board O O O
0 DATE 26 O IN/CD in/1999 O O O
0 O 27 O CC and O O O
0 O 28 O VBD took O O O
0 O 29 O RP up O O O
0 O 30 O PRP$ his O O O
0 O 31 O JJ current O O O
0 O 32 O NN appointment O O O
0 O 33 O IN as O O O
0 DESG 34 O NNP/NNP Chief/Executive O O O
0 O 35 O IN in O O O
0 DATE 36 O NNP/CD April/2005 O O O
0 O 37 O . . O O O
20 34 cur_desg
20 36 cur_desg_from
I am trying to train custom relation model and added my custom relation classes.
ex: relation class -> **cur_desg** (current designation) between entities (**PERSON, DESG**)
**Here is the relevant section of my properties file to train the relation classifier.**
datasetReaderClass = com.samrat.nlp.ie.re.CustomConllReader
entityClassifier = com.samrat.nlp.ie.re.CustomConllExtractor
relationResultsPrinters = com.samrat.nlp.ie.re.RelationResultPrinter
serializedTrainingSentencesPath = custom_relation_sentences.ser
serializedEntityExtractorPath = custom_relation_model.ser
serializedRelationExtractorPath = custom-relation-model-pipeline.ser
Relevant section of Code CustomConllReader
private String getNormalizedNERTag(String ner) {
......
} else if(ner.equalsIgnoreCase("degree")) {
return "DEGREE";
}
else if(ner.equalsIgnoreCase("electedbody")) {
return "ELECTEDBODY";
}
...............
Problem 1 (CustomConllReader throws exception at following line while reading training data)
Span span = new Span(entity1.getExtentTokenStart(), entity2.getExtentTokenEnd());
Relevant portion of CustomConllReader (It is almost same as RothCONLL04Reader)
case 3: // relation
System.out.println(currentLine);
String type = pieces.get(2);
List<ExtractionObject> args = new ArrayList<>();
EntityMention entity1 = indexToEntityMention.get(pieces.get(0));
EntityMention entity2 = indexToEntityMention.get(pieces.get(1));
args.add(entity1);
args.add(entity2);
Span span = new Span(entity1.getExtentTokenStart(), entity2.getExtentTokenEnd());
// identifier = "relation" + sentenceID + "-" + sentence.getAllRelations().size();
identifier = RelationMention.makeUniqueId();
RelationMention relationMention = new RelationMention(identifier,
sentence, span, type, null, args);
AnnotationUtils.addRelationMention(sentence, relationMention);
break;
Exception
INFO: Reading file: tagged-training-relation-data-conll04.corp
20 34 cur_desg
20 36 cur_desg_from
0 2 cur_desg
Exception in thread "main" java.io.IOException
at edu.stanford.nlp.ie.machinereading.GenericDataSetReader.parse(GenericDataSetReader.java:138)
at com.wipro.nlp.ie.re.CustomConllReader.main(CustomConllReader.java:292)
Caused by: java.lang.NullPointerException
at com.wipro.nlp.ie.re.CustomConllReader.readSentence(CustomConllReader.java:144)
at com.wipro.nlp.ie.re.CustomConllReader.read(CustomConllReader.java:55)
at edu.stanford.nlp.ie.machinereading.GenericDataSetReader.parse(GenericDataSetReader.java:136)
... 1 more
The exception thrown on sentence 3 while parsing the relation (0 2 cur_desg)
3 PERSON 0 O NNP/NNP John/Bason O O O
3 O 1 O NNP Finance O O O
3 ELECTEDBODY 2 O NNP Director O O O
3 O 3 O -LRB- -LRB- O O O
3 O 4 O NN age O O O
3 NUMBER 5 O CD 59 O O O
3 O 6 O -RRB- -RRB- O O O
3 PERSON 7 O NNP John O O O
3 O 8 O VBD was O O O
3 O 9 O VBN appointed O O O
3 O 10 O IN as O O O
3 O 11 O NNP Finance O O O
3 ELECTEDBODY 12 O NNP Director O O O
3 O 13 O IN in O O O
3 DATE 14 O NNP/CD May/1999 O O O
3 O 15 O . . O O O
0 2 cur_desg
0 14 cur_desg_from
This problem is solved, my training data has extra line break in between i am able to build a custom relation classifier. But now while using that custom relation classifier it does not understand any custom NER tags or custom relations.
Separate question here below. (for making custom relation classifier understand custom ner tags and relations in new sentences) Custom Relation Classifier does not understand any Custom NER tags and does not find any relations