Overview
I am using the WEKA API 3.7.10 (developer version) to use my pre-made .model
files.
I made 25 models: five outcome variables for five algorithms.
- J48 decision tree.
- Alternating decision tree
- Random forest
- LogitBoost
- Random subspace
I am having problems with J48, Random subspace and random forest.
Necessary files
The following is the ARFF
representation of my data after creation:
@relation WekaData
@attribute ageDiagNum numeric
@attribute raceGroup {Black,Other,Unknown,White}
@attribute stage3 {0,I,IIA,IIB,IIIA,IIIB,IIIC,IIINOS,IV,'UNK Stage'}
@attribute m3 {M0,M1,MX}
@attribute reasonNoCancerSurg {'Not performed, patient died prior to recommended surgery','Not recommended','Not recommended, contraindicated due to other conditions','Recommended but not performed, patient refused','Recommended but not performed, unknown reason','Recommended, unknown if performed','Surgery performed','Unknown; death certificate or autopsy only case'}
@attribute ext2 {00,05,10,11,13,14,15,16,17,18,20,21,23,24,25,26,27,28,30,31,33,34,35,36,37,38,40,50,60,70,80,85,99}
@attribute time2 {}
@attribute time4 {}
@attribute time6 {}
@attribute time8 {}
@attribute time10 {}
@data
65,White,IIA,MX,'Not recommended, contraindicated due to other conditions',14,?,?,?,?,?
I need to get the binary attributes time2
to time10
from their respective models.
Below are snippets of the code I use to get the predictions from all the model files:
private static Map<String, Object> predict(Instances instances,
Classifier classifier, int attributeIndex) {
Map<String, Object> map = new LinkedHashMap<String, Object>();
int instanceIndex = 0; // do not change, equal to row 1
double[] percentage = { 0 };
double outcomeValue = 0;
AbstractOutput abstractOutput = null;
if(classifier.getClass() == RandomForest.class || classifier.getClass() == RandomSubSpace.class) {
// has problems predicting time2 to time10
instances.setClassIndex(5);
} else {
// works as intended in LogitBoost and ADTree
instances.setClassIndex(attributeIndex);
}
try {
outcomeValue = classifier.classifyInstance(instances.instance(0));
percentage = classifier.distributionForInstance(instances
.instance(instanceIndex));
} catch (Exception e) {
e.printStackTrace();
}
map.put("Class", outcomeValue);
if (percentage.length > 0) {
double percentageRaw = 0;
if (outcomeValue == new Double(1)) {
percentageRaw = percentage[1];
} else {
percentageRaw = 1 - percentage[0];
}
map.put("Percentage", percentageRaw);
} else {
// because J48 returns an error if percentage[i] because it's empty
map.put("Percentage", new Double(0));
}
return map;
}
Here are the models I use to predict outcome for time2
hence we will use index 6:
instances.setClassIndex(5);
ADTree
model fortime2
predictionJ48
model fortime2
predictionRandomForest
model fortime2
predictionLogitBoost
model fortime2
predictionRandomSubSpace
model fortime2
prediction
Problems
As I said before,
LogitBoost
andADTree
have no problem in this straightforward method compared to the other three, as I followed the "Use WEKA in your Java code" tutorial.[Solved] Based from my tweakings,
RandomForest
andRandomSubSpace
returns anArrayOutOfBoundsException
if told to predicttime2
totime10
.java.lang.ArrayIndexOutOfBoundsException: 0 at weka.classifiers.meta.Bagging.distributionForInstance(Bagging.java:586) at weka.classifiers.trees.RandomForest.distributionForInstance(RandomForest.java:602) at weka.classifiers.AbstractClassifier.classifyInstance(AbstractClassifier.java:70)
The stack trace points the root error to the line:
outcomeValue = classifier.classifyInstance(instances.instance(0));
Solution: I had some copy-paste error during the
ARFF
file creation for the binary variablestime2
totime10
regardingFastVector<String>()
's assignment of values to theFastVector<Attribute>()
object. All ten of myRandomForest
andRandomSubSpace
models are working fine right now![Solved]
J48
decision tree has a new problem now. Instead of not providing any predictions, it now returns an error:java.lang.ArrayIndexOutOfBoundsException: 11 at weka.core.DenseInstance.value(DenseInstance.java:332) at weka.core.AbstractInstance.isMissing(AbstractInstance.java:315) at weka.classifiers.trees.j48.C45Split.whichSubset(C45Split.java:494) at weka.classifiers.trees.j48.ClassifierTree.getProbs(ClassifierTree.java:670) at weka.classifiers.trees.j48.ClassifierTree.classifyInstance(ClassifierTree.java:231) at weka.classifiers.trees.J48.classifyInstance(J48.java:266)
and it traces to the line
outcomeValue = classifier.classifyInstance(instances.instance(0));
Solution: actually I randomly ran the program with
J48
and it worked - giving the outcome variable and associated distributions.
I hope someone can help me sort out this issue. I really do not know what is wrong with this code as I have checked the Javadocs and examples online and the constant predictions are still persistent.
(I am currently checking the main program for the WEKA GUI but please help me out here :-) )