I'm creating an application for classifying humans in images of urban setting.
I train a classifer in following manner:
int main (int argc, char **argv)
{
/* STEP 2. Opening the file */
//1. Declare a structure to keep the data
CvMLData cvml;
//2. Read the file
cvml.read_csv ("directory/train_rand.csv");
//3. Indicate which column is the response
cvml.set_response_idx (0);
/* STEP 3. Splitting the samples */
//1. Select 4000 for the training
CvTrainTestSplit cvtts (4000, true);
//2. Assign the division to the data
cvml.set_train_test_split (&cvtts);
printf ("Training ... ");
/* STEP 4. The training */
//1. Declare the classifier
CvBoost boost;
//2. Train it with 100 features
boost.train (&cvml, CvBoostParams (CvBoost::REAL,100, 0, 1, false, 0),
false);
/* STEP 5. Calculating the testing and training error */
// 1. Declare a couple of vectors to save the predictions of each sample
std::vector<float> train_responses, test_responses;
// 2. Calculate the training error
float fl1 = boost.calc_error (&cvml, CV_TRAIN_ERROR, &train_responses);
// 3. Calculate the test error
float fl2 = boost.calc_error (&cvml, CV_TEST_ERROR, &test_responses);
cout<<"Error train: "<<fl1<<endl;
cout<<"Error test: "<<fl2<<endl;
/* STEP 6. Save your classifier */
// Save the trained classifier
boost.save ("./trained_boost_4000samples-100ftrs.xml", "boost");
return 0;
}
train_rand.csv is a file where the first column is the category. The rest of the columns are going to be the features of the problem. For example, I could have used three features. Each of them represent the average of red, blue and green per pixel in the image. So my csv file should look like this. Note that in the first column I am using a character, so OpenCV recognizes that as a category.
B,124.34,45.4,12.4
B,64.14,45.23,3.23
B,42.32,125.41,23.8
R,224.4,35.34,163.87
R,14.55,12.423,89.67
...
For my actual problem, I'm using 100 features and 8000 samples. I train the classifier with half of the data and test the with the rest.
After training, I get a test error of around 5% (which is pretty good for only 100 features).
Now I want to use the classifier in new data:
CvBoost boost
boost.load("directory/trained_boost_4000samples-100ftrs.xml");
float x = boost.predict(SampleData,Mat(),Range::all(),false,false);
cout<<x;
I'm running this code over thousands of samples and it always outputs the same value, which is 2. I really don't understand what I am doing wrong here, but even if I trained to classifier in a wrong way, it wouldn't classify 100% of the times in the same way, also, the test error I calculated before shows that the classifier should work fine.
One thing that is bothering me is that SampleData has to have same number of columns as the sample I used to train. The thing is, the data used to train has 100 columns + 1 response, and if I try to run the classifier with only 100 features it throws an exception saying that sizes doesn't match. If I run the classifier with 101 features (which is absolutely arbitrary) it works, but the results doesn't make any sense.
Can anyone help me with this? Thanks in advance!
Regards