Using scikit-learn, how do I learn a SVM over a sm

2019-07-17 03:50发布

问题:

With scikit-learn, I have built a support vector machine, for a basic handwritten digit detection problem.

My total data set consists of 235 observations. My observations consist of 1025 features each. I know that one of the advantages of using a support vector machine is in situations like this, where there are a modest number of observations that have a large number of features.

After my SVM is created, I look at my confusion matrix (below)...

Confusion Matrix:
[[ 6  0]
 [ 0 30]]

...and realize that holding out 15% of my data for testing (i.e., 36 observations) is not enough.

My problem is this: How can I work around this small data issue, using cross validation?

回答1:

This is exactly what cross validation (and its generalizations, like Err^0.632) is for. Hold-out set is reasonable only with huge quantities of data.