I am looking on standford corenlp using the Named Entity REcognizer.I have different kinds of input text and i need to tag it into my own Entity.So i started training my own model and it doesnt seems to be working.
For eg: my input text string is "Book of 49 Magazine Articles on Toyota Land Cruiser 1956-1987 Gold Portfolio http://t.co/EqxmY1VmLg http://t.co/F0Vefuoj9Q"
I go through the examples to train my own models and and look for only some words that I am interested in.
My jane-austen-emma-ch1.tsv looks like this
Toyota PERS
Land Cruiser PERS
From the above input text i am only interested in those two words. The one is Toyota and the other word is Land Cruiser.
The austin.prop look like this
trainFile = jane-austen-emma-ch1.tsv
serializeTo = ner-model.ser.gz
map = word=0,answer=1
useClassFeature=true
useWord=true
useNGrams=true
noMidNGrams=true
useDisjunctive=true
maxNGramLeng=6
usePrev=true
useNext=true
useSequences=true
usePrevSequences=true
maxLeft=1
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
Run the following command to generate the ner-model.ser.gz file
java -cp stanford-corenlp-3.4.1.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop austen.prop
public static void main(String[] args) {
String serializedClassifier = "edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz";
String serializedClassifier2 = "C:/standford-ner/ner-model.ser.gz";
try {
NERClassifierCombiner classifier = new NERClassifierCombiner(false, false,
serializedClassifier2,serializedClassifier);
String ss = "Book of 49 Magazine Articles on Toyota Land Cruiser 1956-1987 Gold Portfolio http://t.co/EqxmY1VmLg http://t.co/F0Vefuoj9Q";
System.out.println("---");
List<List<CoreLabel>> out = classifier.classify(ss);
for (List<CoreLabel> sentence : out) {
for (CoreLabel word : sentence) {
System.out.print(word.word() + '/' + word.get(AnswerAnnotation.class) + ' ');
}
System.out.println();
}
} catch (ClassCastException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Here is the output I am getting
Book/PERS of/PERS 49/O Magazine/PERS Articles/PERS on/O Toyota/PERS Land/PERS Cruiser/PERS 1956-1987/PERS Gold/O Portfolio/PERS http://t.co/EqxmY1VmLg/PERS http://t.co/F0Vefuoj9Q/PERS
which i think its wrong.I am looking for Toyota/PERS and Land Cruiser/PERS(Which is a multi valued fied.
Thanks for the Help.Any help is really appreciated.