How do we compute accuracy for clusters using Weka?
I can use this formula:
Accuracy (A) = (tp+tn)/Total # samples
but how can I know what is the true positive, false positive, true negative and false negative in the output of experiment in the Weka tool?
There are a few different clustering modes in Weka:
Use training set (default): After clustering, Weka classifies the training instances into clusters it developed and computes the percentage of instances falling in each cluster. For example, X% in cluster 0 and Y% in cluster 1, etc.
Supplied test set: It is possible with Weka to evaluate clusterings on separate test data if the cluster representation is probabilistic like EM algorithm.
Clustering evaluation using classes: In this mode Weka first ignores the class attribute and generates the clustering. During testing, it assigns class labels to the clusters on the basis of the majority value of the class attribute within each cluster. Finally, it computes the classification error and also shows the corresponding confusion matrix.
Take a look on cross-validation principles. Use ClusterEvaluation 's methods crossValidateModel and evaluateClusterer in your java code. Or you can also experiment that with the weka GUI directly.