Get prediction percentage in WEKA using own Java c

2020-06-18 10:01发布

问题:

Overview

I know that one can get the percentages of each prediction in a trained WEKA model through the GUI and command line options as conveniently explained and demonstrated in the documentation article "Making predictions".

Predictions

I know that there are three ways documented to get these predictions:

  1. command line
  2. GUI
  3. Java code/using the WEKA API, which I was able to do in the answer to "Get risk predictions in WEKA using own Java code"
  4. this fourth one requires a generated WEKA .MODEL file

I have a trained .MODEL file and now I want to classify new instances using this together with the prediction percentages similar to the one below (an output of the GUI's Explorer, in CSV format):

inst#,actual,predicted,error,distribution,
1,1:0,2:1,+,0.399409,*0.7811
2,1:0,2:1,+,0.3932409,*0.8191
3,1:0,2:1,+,0.399409,*0.600591
4,1:0,2:1,+,0.139409,*0.64
5,1:0,2:1,+,0.399409,*0.600593
6,1:0,2:1,+,0.3993209,*0.600594
7,1:0,2:1,+,0.500129,*0.600594
8,1:0,2:1,+,0.399409,*0.90011
9,1:0,2:1,+,0.211409,*0.60182
10,1:0,2:1,+,0.21909,*0.11101

The predicted column is what I want to get from a .MODEL file.


What I know

Based from my experience with the WEKA API approach, one can get these predictions using the following code (the PlainText inserted into an Evaluation object) BUT I do not want to do k-fold cross-validation that is provided by the Evaluation object.

StringBuffer predictionSB = new StringBuffer();
Range attributesToShow = null;
Boolean outputDistributions = new Boolean(true);

PlainText predictionOutput = new PlainText();
predictionOutput.setBuffer(predictionSB);
predictionOutput.setOutputDistribution(true);

Evaluation evaluation = new Evaluation(data);
evaluation.crossValidateModel(j48Model, data, numberOfFolds,
        randomNumber, predictionOutput, attributesToShow,
        outputDistributions);

System.out.println(predictionOutput.getBuffer());

From the WEKA documentation

Note that a .MODEL file classifies data from an .ARFF or related input is discussed in "Use Weka in your Java code" and "Serialization" a.k.a. "How to use a .MODEL file in your own Java code to classify new instances" (why the vague title smfh).

Using own Java code to classify

Loading a .MODEL file is through "Deserialization" and the following is for versions > 3.5.5:

// deserialize model
Classifier cls = (Classifier) weka.core.SerializationHelper.read("/some/where/j48.model");

An Instance object is the data and it is fed to the classifyInstance. An output is provided here (depending on the data type of the outcome attribute):

// classify an Instance object (testData)
cls.classifyInstance(testData.instance(0));

The question "How to reuse saved classifier created from explorer(in weka) in eclipse java" has a great answer too!

Javadocs

I have already checked the Javadocs for Classifier (the trained model) and Evaluation (just in case) but none directly and explicitly addresses this issue.

The only thing closest to what I want is the classifyInstances method of the Classifier:

Classifies the given test instance. The instance has to belong to a dataset when it's being classified. Note that a classifier MUST implement either this or distributionForInstance().


How can I simultaneously use a WEKA .MODEL file to classify and get predictions of a new instance using my own Java code (aka using the WEKA API)?

回答1:

This answer simply updates my answer from How to reuse saved classifier created from explorer(in weka) in eclipse java.

I will show how to obtain the predicted instance value and the prediction percentage (or distribution). The example model is a J48 decision tree created and saved in the Weka Explorer. It was built from the nominal weather data provided with Weka. It is called "tree.model".

import weka.classifiers.Classifier;
import weka.core.Instances;

public class Main {

    public static void main(String[] args) throws Exception
    {
        String rootPath="/some/where/"; 
        Instances originalTrain= //instances here

        //load model
        Classifier cls = (Classifier) weka.core.SerializationHelper.read(rootPath+"tree.model");

        //predict instance class values
        Instances originalTrain= //load or create Instances to predict

        //which instance to predict class value
        int s1=0;

        //perform your prediction
        double value=cls.classifyInstance(originalTrain.instance(s1));

        //get the prediction percentage or distribution
        double[] percentage=cls.distributionForInstance(originalTrain.instance(s1));

        //get the name of the class value
        String prediction=originalTrain.classAttribute().value((int)value); 

        System.out.println("The predicted value of instance "+
                                Integer.toString(s1)+
                                ": "+prediction); 

        //Format the distribution
        String distribution="";
        for(int i=0; i <percentage.length; i=i+1)
        {
            if(i==value)
            {
                distribution=distribution+"*"+Double.toString(percentage[i])+",";
            }
            else
            {
                distribution=distribution+Double.toString(percentage[i])+",";
            }
        }
        distribution=distribution.substring(0, distribution.length()-1);

        System.out.println("Distribution:"+ distribution);
    }

}

The output from this is:

The predicted value of instance 0: no  
Distribution: *1, 0