I have gold data where I annotated all room numbers from several documents. I want to use openNLP to train a model that uses this data and classify room numbers. I am stuck on where to start. I read openNLP maxent documentation, looked at examples in opennlp.tools and now looking at opennlp.tools.ml.maxent - it seems like it is something what I should be using, but still I have no idea on how to use. Can somebody give me some basic idea on how to use openNLP maxent and where to start with? Any help will be appreciated.
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
This is a minimal working example that demonstrates the usage of OpenNLP Maxent API.
It includes the following:
- Training a maxent model from data stored in a file.
- Storing the trained model into a file.
- Loading the trained model from a file.
- Using the model for classification.
- NOTE: the outcome is the first element in each training sample
- NOTE: the values can be arbitrary strings, e.g.
xyz=s0methIng
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.zip.GZIPInputStream;
import opennlp.maxent.GIS;
import opennlp.maxent.io.GISModelReader;
import opennlp.maxent.io.SuffixSensitiveGISModelWriter;
import opennlp.model.AbstractModel;
import opennlp.model.AbstractModelWriter;
import opennlp.model.DataIndexer;
import opennlp.model.DataReader;
import opennlp.model.FileEventStream;
import opennlp.model.MaxentModel;
import opennlp.model.OnePassDataIndexer;
import opennlp.model.PlainTextFileDataReader;
...
String trainingFileName = "training-file.txt";
String modelFileName = "trained-model.maxent.gz";
// Training a model from data stored in a file.
// The training file contains one training sample per line.
// Outcome (result) is the first element on each line.
// Example:
// result=1 a=1 b=1
// result=0 a=0 b=1
// ...
DataIndexer indexer = new OnePassDataIndexer( new FileEventStream(trainingFileName));
MaxentModel trainedMaxentModel = GIS.trainModel(100, indexer); // 100 iterations
// Storing the trained model into a file for later use (gzipped)
File outFile = new File(modelFileName);
AbstractModelWriter writer = new SuffixSensitiveGISModelWriter((AbstractModel) trainedMaxentModel, outFile);
writer.persist();
// Loading the gzipped model from a file
FileInputStream inputStream = new FileInputStream(modelFileName);
InputStream decodedInputStream = new GZIPInputStream(inputStream);
DataReader modelReader = new PlainTextFileDataReader(decodedInputStream);
MaxentModel loadedMaxentModel = new GISModelReader(modelReader).getModel();
// Now predicting the outcome using the loaded model
String[] context = {"a=1", "b=0"};
double[] outcomeProbs = loadedMaxentModel.eval(context);
String outcome = loadedMaxentModel.getBestOutcome(outcomeProbs);