Let's say we have the following dataset:
X1: {4,7,0,1}
X2: {4,3,2,1}
X3: {6,6,6,6}
I'd like to remove any instance that has an attribute with value > 5
, in this example X1 and X3 should be removed. I have more than 500 attributes, and I've tried to use:
SubsetByExpression -E "(ATT1 < 6) or ... or (ATT500 < 6)"
which did filter most of the instances, but there are still some instances that have values greater than 5 (I'm not really sure why it removed some and retained others).
Is there another more appropriate filter to use or any other way to achieve this task from within WEKA?
Update:
Here's a concrete example. The ARFF file's content:
@relation Test
@attribute word_1 NUMERIC
@attribute word_2 NUMERIC
@attribute word_3 NUMERIC
@attribute word_4 NUMERIC
@data
4,7,0,1
4,3,2,1
6,6,6,6
0,5,1,4
I'd like to remove all instances that have an attribute with a value of 6 or more, so the 1st and 3rd rows should be removed. If I use this filter:
SubsetByExpression -E "(ATT1 < 6) or (ATT2 < 6) or (ATT3 < 6) or (ATT4 < 6)"
Only one instance is removed, which is the the 3rd, but the 1st instance is still there.
The version I'm using is: 3.6.2
If you change your expression to:
SubsetByExpression -E "(ATT1< 6) and (ATT2< 6) and (ATT3< 6) and (ATT4< 6)", you get the desired result.
I believe your current statement says that you should keep the instance as long as one attribute value is less than six. This new statement says all attribute values should be less than six