Removing instances in Weka

2019-05-26 02:38发布

问题:

I am using the Weka Java API where I have a piece of code. In the code, I am trying to do something as follows:

for (each instance i in the training/test set)
        if (condition == TRUE)
            remove instance (i) from training/test set;

[Edit] For example, I have 1000 instances and I am trying to see for each instance, if a particular condition is met. If the condition is true, then I will remove the instance from the training/test set.

I believe that Weka does not have an option for direct removal of instances in this way. Any suggestions, pros?

回答1:

I don't see the problem here.

Naive method

Iterate over all instances in the data set and remove the ones that match your condition.

Instances data;
...

// it's important to iterate from last to first, because when we remove
// an instance, the rest shifts by one position.
for (int i = data.numInstances - 1; i >= 0; i--) {
    Instance inst = data.getInstance(i);
    if (condition(inst)) {
        data.delete(i);
    }
}

Filter method

Use one of Weka instance filters (supervised or unsupervised) or write your own.

For example, you can use the RemoveWithValues filter and apply Batch filtering

Instances data;
RemoveWithValues filter = new RemoveWithValues();

String[] options = new String[4];
options[0] = "-C";   // attribute index
options[1] = "5";    // 5
options[2] = "-S";   // match if value is smaller than
options[3] = "10";   // 10
filter.setOptions(options);

filter.setInputFormat(data);
Instances newData = Filter.useFilter(data, filter);