weka.core.UnassignedDatasetException when creating

2019-02-08 14:05发布

问题:

I trained an IBK classifier with some training data that I created manually as following:

ArrayList<Attribute> atts = new ArrayList<Attribute>();
ArrayList<String> classVal = new ArrayList<String>();
classVal.add("C1");
classVal.add("C2");
atts.add(new Attribute("a"));
atts.add(new Attribute("b"));
atts.add(new Attribute("c"));
atts.add(new Attribute("d"));
atts.add(new Attribute("@@class@@", classVal));

Instances dataRaw = new Instances("TestInstances", atts, 0);
dataRaw.setClassIndex(dataRaw.numAttributes() - 1);
double[] instanceValue1 = new double[]{3,0,1,0,0};
dataRaw.add(new DenseInstance(1.0, instanceValue1));

double[] instanceValue2 = new double[]{2,1,1,0,0};
dataRaw.add(new DenseInstance(1.0, instanceValue2));

double[] instanceValue3 = new double[]{2,0,2,0,0};
dataRaw.add(new DenseInstance(1.0, instanceValue3));

double[] instanceValue4 = new double[]{1,3,0,0,1};
dataRaw.add(new DenseInstance(1.0, instanceValue4));

double[] instanceValue5 = new double[]{0,3,1,0,1};
dataRaw.add(new DenseInstance(1.0, instanceValue5));

double[] instanceValue6 = new double[]{0,2,1,1,1};
dataRaw.add(new DenseInstance(1.0, instanceValue6));

Then I build up the classifier:

IBk ibk = new IBk();
try {
    ibk.buildClassifier(dataRaw);

} catch (Exception e) {
    e.printStackTrace();
}

I want to create a new instance with unlabeled class and classify this instance, I tried the following with no luck.

IBk ibk = new IBk();
try {
    ibk.buildClassifier(dataRaw);
    double[] values = new double[]{3,1,0,0,-1};
    DenseInstance newInst = new DenseInstance(1.0,values);
    double classif = ibk.classifyInstance(newInst);
    System.out.println(classif);
} catch (Exception e) {
    e.printStackTrace();
}

I just get the following errors

weka.core.UnassignedDatasetException: DenseInstance doesn't have access to a dataset!
at weka.core.AbstractInstance.classAttribute(AbstractInstance.java:98)
at weka.classifiers.AbstractClassifier.classifyInstance(AbstractClassifier.java:74)
at TextCategorizationTest.instancesWithDoubleValues(TextCategorizationTest.java:136)
at TextCategorizationTest.main(TextCategorizationTest.java:33)

Looks like I am doing something wrong while creating a new instance. How can I create an unlabeled instance exactly ?

Thanks in Advance

回答1:

The problem is with this line:

double classif = ibk.classifyInstance(newInst);

When you try to classify newInst, Weka throws an exception because newInst has no Instances object (i.e., dataset) associated with it - thus it does not know anything about its class attribute.

You should first create a new Instances object similar to the dataRaw, add your unlabeled instance to it, set class index, and only then try classifying it, e.g.:

Instances dataUnlabeled = new Instances("TestInstances", atts, 0);
dataUnlabeled.add(newInst);
dataUnlabeled.setClassIndex(dataUnlabeled.numAttributes() - 1);        
double classif = ibk.classifyInstance(dataUnlabeled.firstInstance());


回答2:

You will see this error, when you classify a new instance which is not associated with a dataset. You have to associate every new instance you create to an Instances object using setDataset.

//Make a place holder Instances
//If you already have access to one, you can skip this step
Instances dataset = new Instances("testdata", attr, 1);
dataset.setClassIndex(classIdx);

DenseInstance newInst = new DenseInstance(1.0,values);

//To associate your instance with Instances object, in this case dataset
newInst.setDataset(dataset); 

After this you can classify newly created instance.

double classif = ibk.classifyInstance(newInst);

http://www.cs.tufts.edu/~ablumer/weka/doc/weka.core.Instance.html

Detailed Implementation Link



回答3:

See pages 203 - 204 of the WEKA documentation. That helped me a lot! (The Weka Manual is a pdf file that is located in your weka installation folder. Just open the doucmentation.html and it will point you to the pdf manual.)

Copy-pasting some snippets of the code listings of Chapter 17 (Using the WEKA API / Creating datasets in memory) should help you solve the task.