I'm trying to retrieve classes from WEKA using MATLAB and WEKA API. All looks fine but classes are always 0. Any idea ??
My data set has 241 atributes, applying WEKA to this dataset I'm obtaining correct results.
1st train and test objects are created than classifier is build and classifyInstance performed. But this give wrong result
train = [xtrain ytrain];
test = [xtest];
save ('train.txt','train','-ASCII');
save ('test.txt','test','-ASCII');
%## paths
WEKA_HOME = 'C:\Program Files\Weka-3-7';
javaaddpath([WEKA_HOME '\weka.jar']);
fName = 'train.txt';
%## read file
loader = weka.core.converters.MatlabLoader();
loader.setFile( java.io.File(fName) );
train = loader.getDataSet();
train.setClassIndex( train.numAttributes()-1 );
% setting class as nominal
v(1) = java.lang.String('-R');
v(2) = java.lang.String('242');
options = cat(1,v(1:end));
filter = weka.filters.unsupervised.attribute.NumericToNominal();
filter.setOptions(options);
filter.setInputFormat(train);
train = filter.useFilter(train, filter);
fName = 'test.txt';
%## read file
loader = weka.core.converters.MatlabLoader();
loader.setFile( java.io.File(fName) );
test = loader.getDataSet();
%## dataset
relationName = char(test.relationName);
numAttr = test.numAttributes;
numInst = test.numInstances;
%## classification
classifier = weka.classifiers.trees.J48();
classifier.buildClassifier( train );
fprintf('Classifier: %s %s\n%s', ...
char(classifier.getClass().getName()), ...
char(weka.core.Utils.joinOptions(classifier.getOptions())), ...
char(classifier.toString()) )
classes =[];
for i=1:numInst
classes(i) = classifier.classifyInstance(test.instance(i-1));
end
Here is a new code but still not working - classes = 0. Output from Weka for the same algo and data set is OK
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.99 0.015 0.985 0.99 0.988 0.991 0
0.985 0.01 0.99 0.985 0.988 0.991 1
Weighted Avg. 0.988 0.012 0.988 0.988 0.988 0.991
=== Confusion Matrix ===
a b <-- classified as
1012 10 | a = 0
15 1003 | b = 1
ytest1 = ones(size(xtest,1),1);
train = [xtrain ytrain];
test = [xtest ytest1];
save ('train.txt','train','-ASCII');
save ('test.txt','test','-ASCII');
%## paths
WEKA_HOME = 'C:\Program Files\Weka-3-7';
javaaddpath([WEKA_HOME '\weka.jar']);
fName = 'train.txt';
%## read file
loader = weka.core.converters.MatlabLoader();
loader.setFile( java.io.File(fName) );
train = loader.getDataSet();
train.setClassIndex( train.numAttributes()-1 );
v(1) = java.lang.String('-R');
v(2) = java.lang.String('242');
options = cat(1,v(1:end));
filter = weka.filters.unsupervised.attribute.NumericToNominal();
filter.setOptions(options);
filter.setInputFormat(train);
train = filter.useFilter(train, filter);
fName = 'test.txt';
%## read file
loader = weka.core.converters.MatlabLoader();
loader.setFile( java.io.File(fName) );
test = loader.getDataSet();
filter = weka.filters.unsupervised.attribute.NumericToNominal();
filter.setOptions( weka.core.Utils.splitOptions('-R last') );
filter.setInputFormat(test);
test = filter.useFilter(test, filter);
%## dataset
relationName = char(test.relationName);
numAttr = test.numAttributes;
numInst = test.numInstances;
%## classification
classifier = weka.classifiers.trees.J48();
classifier.buildClassifier( train );
fprintf('Classifier: %s %s\n%s', ...
char(classifier.getClass().getName()), ...
char(weka.core.Utils.joinOptions(classifier.getOptions())), ...
char(classifier.toString()) )
classes = zeros(numInst,1);
for i=1:numInst
classes(i) = classifier.classifyInstance(test.instance(i-1));
end
here is a code snippet for class distribution in Java
// output predictions
System.out.println("# - actual - predicted - error - distribution");
for (int i = 0; i < test.numInstances(); i++) {
double pred = cls.classifyInstance(test.instance(i));
double[] dist = cls.distributionForInstance(test.instance(i));
System.out.print((i+1));
System.out.print(" - ");
System.out.print(test.instance(i).toString(test.classIndex()));
System.out.print(" - ");
System.out.print(test.classAttribute().value((int) pred));
System.out.print(" - ");
if (pred != test.instance(i).classValue())
System.out.print("yes");
else
System.out.print("no");
System.out.print(" - ");
System.out.print(Utils.arrayToString(dist));
System.out.println();
I converted it to MATLAB code like this
classes = zeros(numInst,1);
for i=1:numInst
pred = classifier.classifyInstance(test.instance(i-1));
classes(i) = str2num(char(test.classAttribute().value(( pred))));
end
but classes are output incorrectly.
In your answer you dont show that pred contains classes and predProb probabilities. Just print it !!!
Training and testing data must have the same number of attributes. So in your case, even if you don't know the actual class of the test data, just use dummy values:
ytest = ones(size(xtest,1),1); %# dummy class values for test data
train = [xtrain ytrain];
test = [xtest ytest];
save ('train.txt','train','-ASCII');
save ('test.txt','test','-ASCII');
Don't forget to convert it to a nominal attribute when you load the test dataset (like you did for the training dataset):
filter = weka.filters.unsupervised.attribute.NumericToNominal();
filter.setOptions( weka.core.Utils.splitOptions('-R last') );
filter.setInputFormat(test);
test = filter.useFilter(test, filter);
Finally, you can call the trained J48 classifier to predict the class values for the test instances:
classes = zeros(numInst,1);
for i=1:numInst
classes(i) = classifier.classifyInstance(test.instance(i-1));
end
EDIT
It is difficult to tell without knowing the data you are working with..
So let me illustrate with a complete example. I am going to be creating the datasets in MATLAB out of the Fisher Iris data (4 attributes, 150 instances, 3 classes).
%# load dataset (data + labels)
load fisheriris
X = meas;
Y = grp2idx(species);
%# partition the data into training/testing
c = cvpartition(Y, 'holdout',1/3);
xtrain = X(c.training,:);
ytrain = Y(c.training);
xtest = X(c.test,:);
ytest = Y(c.test); %# or dummy values
%# save as space-delimited text file
train = [xtrain ytrain];
test = [xtest ytest];
save train.txt train -ascii
save test.txt test -ascii
I should mention here that it is important to make sure that the class values are fully represented in each of the two datasets before using the NumericToNominal
filter. Otherwise, the train and test sets could be incompatible. What I mean is that you must have at least one instance from every class value in each. Thus if you are using dummy values, maybe we can do this:
ytest = ones(size(xtest,1),1);
v = unique(Y);
ytest(1:numel(v)) = v;
Next, lets read the newly created files using Weka API. We convert the last attribute from numeric to nominal (to enable classification):
%# read train/test files using Weka
fName = 'train.txt';
loader = weka.core.converters.MatlabLoader();
loader.setFile( java.io.File(fName) );
train = loader.getDataSet();
train.setClassIndex( train.numAttributes()-1 );
fName = 'test.txt';
loader = weka.core.converters.MatlabLoader();
loader.setFile( java.io.File(fName) );
test = loader.getDataSet();
test.setClassIndex( test.numAttributes()-1 );
%# convert last attribute (class) from numeric to nominal
filter = weka.filters.unsupervised.attribute.NumericToNominal();
filter.setOptions( weka.core.Utils.splitOptions('-R last') );
filter.setInputFormat(train);
train = filter.useFilter(train, filter);
filter = weka.filters.unsupervised.attribute.NumericToNominal();
filter.setOptions( weka.core.Utils.splitOptions('-R last') );
filter.setInputFormat(test);
test = filter.useFilter(test, filter);
Now we train a J48 classifier and use it to predict the class of the test instances:
%# train a J48 tree
classifier = weka.classifiers.trees.J48();
classifier.setOptions( weka.core.Utils.splitOptions('-c last -C 0.25 -M 2') );
classifier.buildClassifier( train );
%# classify test instances
numInst = test.numInstances();
pred = zeros(numInst,1);
predProbs = zeros(numInst, train.numClasses());
for i=1:numInst
pred(i) = classifier.classifyInstance( test.instance(i-1) );
predProbs(i,:) = classifier.distributionForInstance( test.instance(i-1) );
end
Finally, we evaluate the trained model performance over the test data (this should look similar to what you see in Weka Explorer). Obviously this only makes sense if the test instances have the true class value (not dummy values):
eval = weka.classifiers.Evaluation(train);
eval.evaluateModel(classifier, test, javaArray('java.lang.Object',1));
fprintf('=== Run information ===\n\n')
fprintf('Scheme: %s %s\n', ...
char(classifier.getClass().getName()), ...
char(weka.core.Utils.joinOptions(classifier.getOptions())) )
fprintf('Relation: %s\n', char(train.relationName))
fprintf('Instances: %d\n', train.numInstances)
fprintf('Attributes: %d\n\n', train.numAttributes)
fprintf('=== Classifier model ===\n\n')
disp( char(classifier.toString()) )
fprintf('=== Summary ===\n')
disp( char(eval.toSummaryString()) )
disp( char(eval.toClassDetailsString()) )
disp( char(eval.toMatrixString()) )
The output in MATLAB for the example above:
=== Run information ===
Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2
Relation: train.txt-weka.filters.unsupervised.attribute.NumericToNominal-Rlast
Instances: 100
Attributes: 5
=== Classifier model ===
J48 pruned tree
------------------
att_4 <= 0.6: 1 (33.0)
att_4 > 0.6
| att_3 <= 4.8
| | att_4 <= 1.6: 2 (32.0)
| | att_4 > 1.6: 3 (3.0/1.0)
| att_3 > 4.8: 3 (32.0)
Number of Leaves : 4
Size of the tree : 7
=== Summary ===
Correctly Classified Instances 46 92 %
Incorrectly Classified Instances 4 8 %
Kappa statistic 0.8802
Mean absolute error 0.0578
Root mean squared error 0.2341
Relative absolute error 12.9975 %
Root relative squared error 49.6536 %
Coverage of cases (0.95 level) 92 %
Mean rel. region size (0.95 level) 34 %
Total Number of Instances 50
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
1 0 1 1 1 1 1
0.765 0 1 0.765 0.867 0.879 2
1 0.118 0.8 1 0.889 0.938 3
Weighted Avg. 0.92 0.038 0.936 0.92 0.919 0.939
=== Confusion Matrix ===
a b c <-- classified as
17 0 0 | a = 1
0 13 4 | b = 2
0 0 16 | c = 3