I want to use one-class classification using LibSVM in MATLAB.
I want to train data and use cross validation, but I don't know what I have to do to label the outliers.
If for example I have this data:
trainData = [1,1,1; 1,1,2; 1,1,1.5; 1,1.5,1; 20,2,3; 2,20,2; 2,20,5; 20,2,2];
labelTrainData = [-1 -1 -1 -1 0 0 0 0];
(The first four are examples of the 1 class, the other four are examples of outliers, just for the cross validation)
And I train the model using this:
model = svmtrain(labelTrainData, trainData , '-s 2 -t 0 -d 3 -g 2.0 -r 2.0 -n 0.5 -m 40.0 -c 0.0 -e 0.0010 -p 0.1 -v 2' );
I'm not sure which value use to label the 1-class data and what to use to the outliers. Does someone knows how to do this?.
Thanks in advance.
-Jessica
According to http://www.joint-research.org/wp-content/uploads/2011/07/lukashevich2009Using-One-class-SVM-Outliers-Detection.pdf "Due to the lack of class labels in
the one-class SVM, it is not possible to optimize the kernel
parameters using cross-validation".
However, according to the LIBSVM FAQ that is not quite correct:
Q: How do I choose parameters for one-class SVM as training data are in only one class?
You have pre-specified true positive rate in mind and then search for parameters which achieve similar cross-validation accuracy.
Furthermore the README for the libsvm source says of the input data:
"For classification, label is an integer indicating the class label ... For one-class SVM, it's not used so can be any number."
I think your outliers should not be included in the training data - libsvm will ignore the training labels. What you are trying to do is find a hypersphere that contains good data but not outliers. If you train with outliers in the data LIBSVM will try yo find a hypersphere that includes the outliers, which is exactly what you don't want. So you will need a training dataset without outliers, a validation dataset with outliers for choosing parameters, and a final test dataset to see whether your model generalizes.