OpenCV SVM always predicts higher class label

2019-05-28 22:17发布

问题:

I'm using the OpenCV SVM implementation to binarily predict the importance of an image feature. I'm therefore training it upon positive and negative image features and looking for a classification in {0,1}.

The problem I'm encountering is that following training, the SVM always predicts the class with the higher/greater class label. I can change the labels for the training data set and this problem persists. I've carefully inspected the generated label and training cv::Mat matrices and haven't found any issues there.

Below is my SVM class and accompanying SVM parameters

//Populate the SVM parameters
void SVM::setSVMParams()
{
    params.svm_type = cv::SVM::C_SVC;
    params.kernel_type = cv::SVM::RBF;
    params.term_crit = cv::TermCriteria(CV_TERMCRIT_ITER, 100, 1e-6);

    params_set = true;
}

//Train the SVM with the given data
void SVM::train(cv::Mat train_data, cv::Mat labels)
{
    //Set the SVM parameters if they haven't been already
    if (!params_set)
    {
        setSVMParams();
    }

    svm.train(train_data, labels, cv::Mat(), cv::Mat(), params);
}

//Based on training, predict the class of the given data
float SVM::predict(cv::Mat sample)
{
    return svm.predict(sample, false);
}

And here is the function responsible for generating the training data and respective labels

//Creates the appropriate training data and class labels for subsequent SVM training according to supplied D threshold
void Matchings::createSVMTrainingObjects(const float t_D, const float positive_label, const float negative_label, bool print_info)
{
    cv::Mat train_data_l((int)matchings_list.size(), 132, CV_32FC1);
    cv::Mat labels_l((int)matchings_list.size(), 1, CV_32FC1);

    int num_pos = 0;
    int num_neg = 0;

    for (int i = 0; i < matchings_list.size(); i++)
    {
        matching_d entry = matchings_list[i];

        //Important feature, label 1
        if (entry.D > t_D)
        {
            labels_l.at<float>(i) = positive_label;

            num_pos++;
        }
        //Unimportant feature, label -1
        else
        {
            labels_l.at<float>(i) = negative_label;

            num_neg++;
        }

        int j = 0;

        //Copy feature into current row of openCV matrix
        train_data_l.at<float>(i, j++) = entry.feature.x;
        train_data_l.at<float>(i, j++) = entry.feature.y;
        train_data_l.at<float>(i, j++) = entry.feature.scale;
        train_data_l.at<float>(i, j++) = entry.feature.angle;
        for (int k = 0; k < 128; k++)
        {
            train_data_l.at<float>(i, j + k) = entry.feature.vec[k];
        }
    }

    std::cout << "For training: #+ves=" << num_pos << ", #-ves=" << num_neg << std::endl;

    train_data = train_data_l;
    labels = labels_l;
}

And finally, here is the function that actually calls upon SVM prediction results for retaining important image features

matchingslist ASIFT::filterFeaturesWithSVM(matchingslist matchings, SVM& svm)
{
    matchingslist new_matchings;

    for (int i = 0; i < (int)matchings.size(); i++)
    {
        cv::Mat first = Utility::keypointToMat(matchings[i].first);
        cv::Mat second = Utility::keypointToMat(matchings[i].second);

        //If both features are of importance, retain them
        if (svm.predict(first) == 1.0f && svm.predict(second) == 1.0f)
        {
            new_matchings.push_back(matchings[i]);
        }
        else
        {
            std::cout << "Feature removed" << std::endl;
        }
    }

    return new_matchings;
}

回答1:

One main problem with the approach is that you do not set hyperparemeters of your SVM, while you use RBF, so probably C=1 and gamma=1/d (or 1/mean ||x||^2) as these are default values in most implementations of SVM.

While these are critical to build a valid model. In particular, if your C value is too low (1 might be, depends on many features of data) then SVM builds a trivial model simply always predicting one of the classes.

What you should do? You should check multiple values of both C and gamma. What are the meanings of these parameters?

  • C (your 1) is a weight of missclassification - greater the C, SVM will try harder to learn training data exactly, possibly at the cost of overfitting.
  • gamma (your default) is the inverse of the 2 times variance of your RBF kernel. In other words - greater the gamma, smaller the Gaussians, and thus - your method is more "local" in the geometrical sense. Again - big gamma helps you minimize the training error (bias) but leads to higher testing error (variance).

Correct selection of the tradeoff between variance-bias is crucial element of machine learning techniques. In case of RBF SVM - you can control it through the above. Play around with them, check both training set error and testing set error to see what is happening. If your training set error is big - increase C and/or gamma. Once your training set error is fine, look at the testing set - if it is too big - try to decrese values and so on. It is usually done in automatic manner through some internal cross validation with grid search of the paremeters.

Check out materials on model selection and hyperparameter optimization.

Furthermore you fix number of iterations

params.term_crit = cv::TermCriteria(CV_TERMCRIT_ITER, 100, 1e-6);

while for SVM you should never do that. Let it converge (or at least put something like 100,000), after just 100 steps it might be the case that SVM did not come even close to convergence (thus resulted in trivial model).