Stanford NNDep parser: java.lang.ArrayIndexOutOfBo

2019-08-11 08:34发布

After training a model, i’m trying to parse the test treebank. Unfortunately, this error keeps popping up:

Loading depparse model file: nndep.model.txt.gz ...
###################
#Transitions: 77
#Labels: 38
ROOTLABEL: root
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 25
        at edu.stanford.nlp.parser.nndep.Classifier.preCompute(Classifier.java:663)
        at edu.stanford.nlp.parser.nndep.Classifier.preCompute(Classifier.java:637)
        at edu.stanford.nlp.parser.nndep.DependencyParser.initialize(DependencyParser.java:1151)
        at edu.stanford.nlp.parser.nndep.DependencyParser.loadModelFile(DependencyParser.java:589)
        at edu.stanford.nlp.parser.nndep.DependencyParser.loadModelFile(DependencyParser.java:493)
        at edu.stanford.nlp.parser.nndep.DependencyParser.main(DependencyParser.java:1245)

If the pre-trained english model, which ships with the NLP package, is used, that error does not appear. Therefore, there is maybe something wrong with the trained model? There were no errors during training, however. 500 iterations were done (default 20000 takes over 15 hours on my 2,33 GHz Core 2 Duo CPU @ 4 Gb RAM – is such an amount of time normal, by the way?) Train, dev and test sets are UD 1.2; word embeddings used are these. Seems that this error happens when non-english treebank is used for training (tried swedish and polish UD; -tlp option is not set, using UniversalEnglish).

1条回答
姐就是有狂的资本
2楼-- · 2019-08-11 09:02

Answering my own question, with a hint in a comment by @Jon Gauthier. It turns out that the -embeddingSize flag is needed also at parsing stage if it was used during training (= other value then the default 50 was used). The documentation never says that, and in fact only refers to the flag in regards to the training phase, but the error message in the question code actually cryptically hints about the origin of the error, displaying „25“̦ which was the dimensionality of the word embeddings used.

查看更多
登录 后发表回答