Labeling one class for cross validation in libsvm

2019-02-20 17:26发布

I want to use one-class classification using LibSVM in MATLAB.

I want to train data and use cross validation, but I don't know what I have to do to label the outliers.

If for example I have this data:

trainData =  [1,1,1; 1,1,2; 1,1,1.5; 1,1.5,1; 20,2,3; 2,20,2; 2,20,5; 20,2,2];
labelTrainData = [-1 -1 -1 -1 0 0 0 0];  

(The first four are examples of the 1 class, the other four are examples of outliers, just for the cross validation)

And I train the model using this:

model = svmtrain(labelTrainData, trainData , '-s 2 -t 0 -d 3 -g 2.0 -r 2.0 -n 0.5 -m 40.0 -c 0.0 -e 0.0010 -p 0.1 -v 2' );

I'm not sure which value use to label the 1-class data and what to use to the outliers. Does someone knows how to do this?.

Thanks in advance. -Jessica

1条回答
Rolldiameter
2楼-- · 2019-02-20 17:54

According to http://www.joint-research.org/wp-content/uploads/2011/07/lukashevich2009Using-One-class-SVM-Outliers-Detection.pdf "Due to the lack of class labels in the one-class SVM, it is not possible to optimize the kernel parameters using cross-validation". However, according to the LIBSVM FAQ that is not quite correct:

Q: How do I choose parameters for one-class SVM as training data are in only one class? You have pre-specified true positive rate in mind and then search for parameters which achieve similar cross-validation accuracy.

Furthermore the README for the libsvm source says of the input data: "For classification, label is an integer indicating the class label ... For one-class SVM, it's not used so can be any number."

I think your outliers should not be included in the training data - libsvm will ignore the training labels. What you are trying to do is find a hypersphere that contains good data but not outliers. If you train with outliers in the data LIBSVM will try yo find a hypersphere that includes the outliers, which is exactly what you don't want. So you will need a training dataset without outliers, a validation dataset with outliers for choosing parameters, and a final test dataset to see whether your model generalizes.

查看更多
登录 后发表回答