After training a model, i’m trying to parse the test treebank. Unfortunately, this error keeps popping up:
Loading depparse model file: nndep.model.txt.gz ...
###################
#Transitions: 77
#Labels: 38
ROOTLABEL: root
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 25
at edu.stanford.nlp.parser.nndep.Classifier.preCompute(Classifier.java:663)
at edu.stanford.nlp.parser.nndep.Classifier.preCompute(Classifier.java:637)
at edu.stanford.nlp.parser.nndep.DependencyParser.initialize(DependencyParser.java:1151)
at edu.stanford.nlp.parser.nndep.DependencyParser.loadModelFile(DependencyParser.java:589)
at edu.stanford.nlp.parser.nndep.DependencyParser.loadModelFile(DependencyParser.java:493)
at edu.stanford.nlp.parser.nndep.DependencyParser.main(DependencyParser.java:1245)
If the pre-trained english model, which ships with the NLP package, is used, that error does not appear. Therefore, there is maybe something wrong with the trained model? There were no errors during training, however. 500 iterations were done (default 20000 takes over 15 hours on my 2,33 GHz Core 2 Duo CPU @ 4 Gb RAM – is such an amount of time normal, by the way?) Train, dev and test sets are UD 1.2; word embeddings used are these. Seems that this error happens when non-english treebank is used for training (tried swedish and polish UD; -tlp
option is not set, using UniversalEnglish
).
Answering my own question, with a hint in a comment by @Jon Gauthier. It turns out that the
-embeddingSize
flag is needed also at parsing stage if it was used during training (= other value then the default 50 was used). The documentation never says that, and in fact only refers to the flag in regards to the training phase, but the error message in the question code actually cryptically hints about the origin of the error, displaying „25“̦ which was the dimensionality of the word embeddings used.