Hi I am performing SVM classification using SMO, in which my kernel is RBF, now I want to select c and sigma values, using grid search and cross validation, I am new to kernel functions, please help, in step by step process
问题:
回答1:
- Pick some values for C and sigma that you think are interesting. E.g., C = {1, 10, 100, 1000} and sigma = {.01, .1, 1} (I'm just making these up).
- Divide the training set into k (e.g. 10) parts, preferably in a stratified way.
- Loop over all pairs of C and sigma values.
- Loop over all k parts of your training set. Hold the k'th part out. Train a classifier on all of the other parts combined, then test on the held out part.
- Keep track of some score (accuracy, F1, or whatever you want to optimize).
- Return the best performing value pair for C, sigma by the scores you just computed.
回答2:
Read A Practical Guide to Support Vector Classication by Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen. They address this exact issue and explain methods for performing a grid-search for parameter selection. http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
回答3:
I will just add a little bit of explanation to larsmans' answer.
The C parameter is a regularization/slack parameter. Its smaller values force the weights to be small. The larger it gets, the allowed range of weights gets wider. Resultantly, larger C values increase the penalty for misclassification and thus reduce the classification error rate on the training data (which may lead to over-fitting). Your training time and number of support vectors will increase as you increase the value of C.
You may also find it useful to read Extending SVM to a Soft Margin Classifier by K.K. Chin.
回答4:
You can also use Uniform Design model selection which reduces the number of tuples you need to check. The paper which explains it is "Model selection for support vector machines via uniform design" by Chien-Ming Huang Some implementation in python are exist in ssvm 0.2