I have used weka and made a Naive Bayes classifier, by using weka GUI. Then I have saved this model by following this tutorial. Now I want to load this model through Java code but I am unable to find any way to load a saved model using weka.
This is my requirement that I have to made model separately and then use it in a separate program.
If anyone can guide me in this regard I will be thankful to you.
You can easily load a saved model in java using this command:
Classifier myCls = (Classifier) weka.core.SerializationHelper.read(pathToModel);
For a complete workflow in Java I wrote the following article in SO Documentation, now copied here:
Text Classification in Weka
Text Classification with LibLinear
Create training instances from .arff file
private static Instances getDataFromFile(String path) throws Exception{
DataSource source = new DataSource(path);
Instances data = source.getDataSet();
if (data.classIndex() == -1){
data.setClassIndex(data.numAttributes()-1);
//last attribute as class index
}
return data;
}
Instances trainingData = getDataFromFile(pathToArffFile);
Use StringToWordVector to transform your string attributes to number representation:
StringToWordVector() filter = new StringToWordVector();
filter.setWordsToKeep(1000000);
if(useIdf){
filter.setIDFTransform(true);
}
filter.setTFTransform(true);
filter.setLowerCaseTokens(true);
filter.setOutputWordCounts(true);
filter.setMinTermFreq(minTermFreq);
filter.setNormalizeDocLength(new SelectedTag(StringToWordVector.FILTER_NORMALIZE_ALL,StringToWordVector.TAGS_FILTER));
NGramTokenizer t = new NGramTokenizer();
t.setNGramMaxSize(maxGrams);
t.setNGramMinSize(minGrams);
filter.setTokenizer(t);
WordsFromFile stopwords = new WordsFromFile();
stopwords.setStopwords(new File("data/stopwords/stopwords.txt"));
filter.setStopwordsHandler(stopwords);
if (useStemmer){
Stemmer s = new /*Iterated*/LovinsStemmer();
filter.setStemmer(s);
}
filter.setInputFormat(trainingData);
Apply the filter to trainingData: trainingData = Filter.useFilter(trainingData, filter);
Create the LibLinear Classifier
- SVMType 0 below corresponds to the L2-regularized logistic regression
Set setProbabilityEstimates(true)
to print the output probabilities
Classifier cls = null;
LibLINEAR liblinear = new LibLINEAR();
liblinear.setSVMType(new SelectedTag(0, LibLINEAR.TAGS_SVMTYPE));
liblinear.setProbabilityEstimates(true);
// liblinear.setBias(1); // default value
cls = liblinear;
cls.buildClassifier(trainingData);
Save model
System.out.println("Saving the model...");
ObjectOutputStream oos;
oos = new ObjectOutputStream(new FileOutputStream(path+"mymodel.model"));
oos.writeObject(cls);
oos.flush();
oos.close();
Create testing instances from .arff
file
Instances trainingData = getDataFromFile(pathToArffFile);
Load classifier
Classifier myCls = (Classifier) weka.core.SerializationHelper.read(path+"mymodel.model");
Use the same StringToWordVector filter as above or create a new one for testingData, but remember to use the trainingData for this command:filter.setInputFormat(trainingData);
This will make training and testing instances compatible.
Alternatively you could use InputMappedClassifier
Apply the filter to testingData: testingData = Filter.useFilter(testingData, filter);
Classify!
1.Get the class value for every instance in the testing set
for (int j = 0; j < testingData.numInstances(); j++) {
double res = myCls.classifyInstance(testingData.get(j));
}
res
is a double value that corresponds to the nominal class that is defined in .arff
file. To get the nominal class use : testintData.classAttribute().value((int)res)
2.Get the probability distribution for every instance
for (int j = 0; j < testingData.numInstances(); j++) {
double[] dist = first.distributionForInstance(testInstances.get(j));
}
dist
is a double array that contains the probabilities for every class defined in .arff
file
Note. Classifier should support probability distributions and enable them with: myClassifier.setProbabilityEstimates(true);