类的WEKA分类可能性(WEKA classification likelihood of the

我想知道，如果有一种方法在WEKA输出数量的“最佳猜测”的分类。

我的情况是：我对数据进行分类有交叉验证的情况下，然后在WEKA的输出我得到的是这样的：这是最好的3-猜测此实例的分类。我要的是什么样子，即使一个实例没有被正确分类我得到的3或5的最佳猜测该实例的输出。

例：

类：A，B，C，d，E实例：1 ... 10

和输出将是：实例1很可能90％为A级，75％可能是B类，60％喜欢被C类..

谢谢。

Answer 1:

Weka中的API有一个名为Classifier.distributionForInstance（）的方法THA可以用来得到分类预测分布。然后，您可以通过降低概率，让您的前N个排序的预测分布。

下面是打印出一个功能：（1）测试实例的地面实况标签; （2）从classifyInstance预测标签（）; 和（3）从distributionForInstance预测分布（）。我曾与J48用这个，但它应该与其他的分类工作。

输入参数是序列化的模型文件（你可以在模型训练阶段创建和应用-d选项），并在ARFF格式的测试文件。

public void test(String modelFileSerialized, String testFileARFF) 
    throws Exception
{
    // Deserialize the classifier.
    Classifier classifier = 
        (Classifier) weka.core.SerializationHelper.read(
            modelFileSerialized);

    // Load the test instances.
    Instances testInstances = DataSource.read(testFileARFF);

    // Mark the last attribute in each instance as the true class.
    testInstances.setClassIndex(testInstances.numAttributes()-1);

    int numTestInstances = testInstances.numInstances();
    System.out.printf("There are %d test instances\n", numTestInstances);

    // Loop over each test instance.
    for (int i = 0; i < numTestInstances; i++)
    {
        // Get the true class label from the instance's own classIndex.
        String trueClassLabel = 
            testInstances.instance(i).toString(testInstances.classIndex());

        // Make the prediction here.
        double predictionIndex = 
            classifier.classifyInstance(testInstances.instance(i)); 

        // Get the predicted class label from the predictionIndex.
        String predictedClassLabel =
            testInstances.classAttribute().value((int) predictionIndex);

        // Get the prediction probability distribution.
        double[] predictionDistribution = 
            classifier.distributionForInstance(testInstances.instance(i)); 

        // Print out the true label, predicted label, and the distribution.
        System.out.printf("%5d: true=%-10s, predicted=%-10s, distribution=", 
                          i, trueClassLabel, predictedClassLabel); 

        // Loop over all the prediction labels in the distribution.
        for (int predictionDistributionIndex = 0; 
             predictionDistributionIndex < predictionDistribution.length; 
             predictionDistributionIndex++)
        {
            // Get this distribution index's class label.
            String predictionDistributionIndexAsClassLabel = 
                testInstances.classAttribute().value(
                    predictionDistributionIndex);

            // Get the probability.
            double predictionProbability = 
                predictionDistribution[predictionDistributionIndex];

            System.out.printf("[%10s : %6.3f]", 
                              predictionDistributionIndexAsClassLabel, 
                              predictionProbability );
        }

        o.printf("\n");
    }
}

Answer 2:

我不知道你是否能本身做，但你可以得到的概率为每个类，分类，并采取了前三。

你想要的功能是distributionForInstance(Instance instance)返回一个double[]为每个类给出的概率。

Answer 3:

不一般。你想要的信息并不适用于所有分类 - 在大多数情况下（如决策树），该决定是明确的（尽管可能不正确的），而不置信值。你的任务需要分类，可以处理不确定性（如朴素贝叶斯分类器）。

技术上最容易做的事情可能是训练模型，然后进行分类的单个实例，为此Weka的应该给你所需的输出。一般来说，你当然也可以做到这一点的套情况下，但我不认为Weka中提供此开箱。你可能会需要自定义代码或（在读例如）使用它通过一个API。

Answer 4:

当你计算实例的概率，到底你是怎么做到这一点？

我已经张贴了我的部分规则和数据的新实例这里，但据计算人工，我不是很确定如何做到这一点！谢谢

编辑：现在计算：

私人浮子[] getProbDist（字符串分割）{

//在一些花费，如（52/2），这意味着52个实例正确分类和2不正确地分类。

    if(prob_dis.length > 2)
        return null;

    if(prob_dis.length == 1){
        String temp = prob_dis[0];
        prob_dis = new String[2];
        prob_dis[0] = "1";
        prob_dis[1] = temp; 
    }

    float p1 = new Float(prob_dis[0]);
    float p2 = new  Float(prob_dis[1]);
    // assumes two tags
    float[] tag_prob = new float[2];

    tag_prob[1] = 1 - tag_prob[1];
    tag_prob[0] = (float)p2/p1;

// returns double[] as being the probabilities

return tag_prob;    
}

文章来源: WEKA classification likelihood of the classes