HOG detector: relation between detected roi size a

I'm experimenting people detector with opencv and HOGDescriptor c++ object: HOGDescriptor::getDefaultPeopleDetector(). Using the sample program peopledetect.cpp in the sample/cpp directory of the Opencv 2.4.3 repository and testing it against some of the INRIA dataset images.. it works quite well.

Now I want to try with some images I have to work with and, even if I try to change parameters.. it doesn't find anything.

I suppose it is because of the pedestrian in the image I have are much more smaller then the INRIA ones. So it should be better to train a new detector but before doing it..

Here my question:

Is it right? Is there a strict relationship between the images used for training and the detected ones? That means that HOG detector is not really scale invariant method.. In particular, what is the best size of the default HOGDescriptor::getDefaultPeopleDetector() ? Do I have to train a new detector for detect much smaller people?

Here is the peopledetect.cpp I'm using:

#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/objdetect/objdetect.hpp"
#include "opencv2/highgui/highgui.hpp"

#include <stdio.h>
#include <string.h>
#include <ctype.h>

#include <iostream>

using namespace cv;
using namespace std;

// static void help()
// {
//     printf(
//             "\nDemonstrate the use of the HoG descriptor using\n"
//             "  HOGDescriptor::hog.setSVMDetector(HOGDescriptor::getDefaultPeopleDetector());\n"
//             "Usage:\n"
//             "./peopledetect (<image_filename> | <image_list>.txt)\n\n");
// }

int main(int argc, char** argv)
{

    std::cout << "OPENCV version: " << CV_MAJOR_VERSION << " " << CV_MINOR_VERSION << std::endl; 

    Mat img;
    FILE* f = 0;
    char _filename[1024];

    if( argc == 1 )
    {
        printf("Usage: peopledetect (<image_filename> | <image_list>.txt)\n");
        return 0;
    }
    img = imread(argv[1]);

    if( img.data )
    {
        strcpy(_filename, argv[1]);
    }
    else
    {
        f = fopen(argv[1], "rt");
        if(!f)
        {
            fprintf( stderr, "ERROR: the specified file could not be loaded\n");
            return -1;
        }
    }

    HOGDescriptor hog;
    hog.setSVMDetector(HOGDescriptor::getDefaultPeopleDetector());
    namedWindow("people detector", 1);

    for(;;)
    {
        char* filename = _filename;
        if(f)
        {
            if(!fgets(filename, (int)sizeof(_filename)-2, f))
                break;
            //while(*filename && isspace(*filename))
            //  ++filename;
            if(filename[0] == '#')
                continue;
            int l = (int)strlen(filename);
            while(l > 0 && isspace(filename[l-1]))
                --l;
            filename[l] = '\0';
            img = imread(filename);
        }
        printf("%s:\n", filename);
        if(!img.data)
            continue;

        fflush(stdout);
        vector<Rect> found, found_filtered;
        double t = (double)getTickCount();
        // run the detector with default parameters. to get a higher hit-rate
        // (and more false alarms, respectively), decrease the hitThreshold and
        // groupThreshold (set groupThreshold to 0 to turn off the grouping completely).
        hog.detectMultiScale(img, found, 0, Size(8,8), Size(32,32), 1.05, 2);
        t = (double)getTickCount() - t;
        printf("tdetection time = %gms\n", t*1000./cv::getTickFrequency());

        std::cout << "found: " << found.size() << std::endl;

        size_t i, j;
        for( i = 0; i < found.size(); i++ )
        {
            Rect r = found[i];
            for( j = 0; j < found.size(); j++ )
                if( j != i && (r & found[j]) == r)
                    break;
            if( j == found.size() )
                found_filtered.push_back(r);
        }
        for( i = 0; i < found_filtered.size(); i++ )
        {
            Rect r = found_filtered[i];
            // the HOG detector returns slightly larger rectangles than the real objects.
            // so we slightly shrink the rectangles to get a nicer output.
            r.x += cvRound(r.width*0.1);
            r.width = cvRound(r.width*0.8);
            r.y += cvRound(r.height*0.07);
            r.height = cvRound(r.height*0.8);
            rectangle(img, r.tl(), r.br(), cv::Scalar(0,255,0), 3);
        }
        imshow("people detector", img);
        int c = waitKey(0) & 255;
        if( c == 'q' || c == 'Q' || !f)
            break;
    }
    if(f)
        fclose(f);
    return 0;
}

回答1:

HOG works with trained data. In order to use it efficiently, you have 3 possibilities:

Use your images with the same/close type of data of the trained data (i.e., like INRIA dataset shots) (the easy way)
Build your own training data to be used with HOG. (the hard way)
Find a very generic SVM set, which can be applied quite everywhere (hard to be found)

回答2:

As in blackibiza answer, I had 2 main choices: find an already trained classifier, or do it for my self.

So, in the end, I managed to train a Hog classifier both with svmlight and with svm included in opencv.

The answer is yes: the detection depends on the sample size used for the training. If the classifier got samples of 64x128 pixel and you are trying to detect smaller object, it doesn't work. But the opposite is true: you can detect bigger object (though pyramid down the image and do a multi-scale-detection, also implemented in opencv).

Even if not documented in the CPU part you can find somewhere in the net, or you can youse the last (version 2.4.8) opencv and look at gpu module and you'll see those methods: gpu::HOGDescriptor::getPeopleDetector48x96 and gpu::HOGDescriptor::getPeopleDetector64x128, that are the two already trained hog detector.

As the last remark, I was warred about training time, but with 500 samples (more or less) the training process takes few minutes with a normal laptop.