I've been trying to use the command Line interface to train my model like this:
opennlp TokenNameFinderTrainer -model en-ner-pincode.bin -iterations 500 \ -lang en -data en-ner-pincode.train -encoding UTF-8
the console output is:
Number of parameters must be always be even
Usage: opennlp TokenNameFinderTrainer[.evalita|.ad|.conll03|.bionlp2004|.conll02|.muc6|.ontonotes|.brat] [-factory factoryName] [-resources resourcesDir] [-type modelType] [-featuregen featuregenFile] [-nameTypes types] [-sequenceCodec codec] [-params paramsFile] -lang language -model modelFile -data sampleData [-encoding charsetName]
It works fine if I don't include the number of Iterations. Does anybody know the reason behind this?
thanks!
Actually the issue is
If anyone use
params
theniterations
andcutoff
are ignored. So for your case this info message is shown.Resource Link:
UPDATE:
So, Please use
ChunkerTrainerME
instead ofTokenNameFinderTrainer
Your command should look like below
UPDATE2: Converting the data
I will use Spanish data as reference, but it would be the same operations to Dutch. You just must remember change “-lang es” to “-lang nl” and use the correct training files. So to convert the information to the OpenNLP format:
Optionally, you can convert the training test samples as well.
Training with Spanish data
To train the model for the name finder:
UPDATE3: Converting the data (optional)
To convert the information to the OpenNLP format:
Optionally, you can convert the training test samples as well.
Training with English data
You can train the model for the name finder this way:
If you have converted the data, then you can train the model for the name finder this way:
From "https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#intro.cli"
So the parameters are either things starting with "." or with "-", and there needs to be an even number of them. There are examples in the documentation that seems to agree with this.
A short answer,
iterations
is not a parameter for TokenNameFinderTrainer. You can see that from the question where the recognized parameters are listed in the console output.