Low accuracy of shape predictor with default datas

I'm trying to use dlib to train shape predictor with default dataset(/dlib-19.0/examples/faces/training_with_face_landmarks.xml) and default train sample(train_shape_predictor_ex.cpp).

So I want to train shape predictor which will be exactly like default shape predictor(shape_predictor_68_face_landmarks.dat) because I used same dataset and same training code. But I get some issues.

After training I get my .dat file with 16.6mb (but default dlib predictor shape_predictor_68_face_landmarks.dat has 99.7mb). After testing my .dat file (16.6mb) I get low accuracy, but after testing default .dat file (shape_predictor_68_face_landmarks.dat, 16.6mb) I get hight accuracy.

My shape predictor: shape_predictor_68_face_landmarks.dat:

Training:

#include <QCoreApplication>

#include <dlib/image_processing.h>
#include <dlib/data_io.h>
#include <iostream>

using namespace dlib;
using namespace std;

std::vector<std::vector<double> > get_interocular_distances (
        const std::vector<std::vector<full_object_detection> >& objects
        );

int main(int argc, char *argv[])
{
    QCoreApplication a(argc, argv);

    try
    {

        const std::string faces_directory = "/home/user/Documents/dlib-19.0/examples/faces/";

        dlib::array<array2d<unsigned char> > images_train;
        std::vector<std::vector<full_object_detection> > faces_train;

        load_image_dataset(images_train, faces_train, faces_directory+"training_with_face_landmarks.xml");

        shape_predictor_trainer trainer;

        trainer.set_oversampling_amount(300);

        trainer.set_nu(0.05);
        trainer.set_tree_depth(2);

        trainer.be_verbose();

        shape_predictor sp = trainer.train(images_train, faces_train);
        cout << "mean training error: "<<
                test_shape_predictor(sp, images_train, faces_train, get_interocular_distances(faces_train)) << endl;

        serialize(faces_directory+"sp_default_settings.dat") << sp;
    }
    catch (exception& e)
    {
        cout << "\nexception thrown!" << endl;
        cout << e.what() << endl;
    }

    return a.exec();
}

double interocular_distance (
        const full_object_detection& det
        )
{
    dlib::vector<double,2> l, r;
    double cnt = 0;
    // Find the center of the left eye by averaging the points around
    // the eye.
    for (unsigned long i = 36; i <= 41; ++i)
    {
        l += det.part(i);
        ++cnt;
    }
    l /= cnt;

    // Find the center of the right eye by averaging the points around
    // the eye.
    cnt = 0;
    for (unsigned long i = 42; i <= 47; ++i)
    {
        r += det.part(i);
        ++cnt;
    }
    r /= cnt;

    // Now return the distance between the centers of the eyes
    return length(l-r);
}

std::vector<std::vector<double> > get_interocular_distances (
        const std::vector<std::vector<full_object_detection> >& objects
        )
{
    std::vector<std::vector<double> > temp(objects.size());
    for (unsigned long i = 0; i < objects.size(); ++i)
    {
        for (unsigned long j = 0; j < objects[i].size(); ++j)
        {
            temp[i].push_back(interocular_distance(objects[i][j]));
        }
    }
    return temp;
}

Testing:

#include <QCoreApplication>
#include <dlib/image_processing/frontal_face_detector.h>
#include <dlib/image_processing/render_face_detections.h>
#include <dlib/image_processing.h>
#include <dlib/gui_widgets.h>
#include <dlib/image_io.h>
#include <dlib/data_io.h>
#include <iostream>

using namespace dlib;
using namespace std;

int main(int argc, char *argv[])
{
    QCoreApplication a(argc, argv);

    try
        {

            // We need a face detector.  We will use this to get bounding boxes for
            // each face in an image.
            frontal_face_detector detector = get_frontal_face_detector();
            // And we also need a shape_predictor.  This is the tool that will predict face
            // landmark positions given an image and face bounding box.  Here we are just
            // loading the model from the shape_predictor_68_face_landmarks.dat file you gave
            // as a command line argument.
            shape_predictor sp;
            deserialize("/home/user/Downloads/muct-master/samples/sp_default_settings.dat") >> sp;

            string srcDir = "/home/user/Downloads/muct-master/samples/selection/";
            string dstDir = "/home/user/Downloads/muct-master/samples/my_results_default/";

            std::vector<string> vecOfImg;

            vecOfImg.push_back("i001qa-mn.jpg");
            vecOfImg.push_back("i002ra-mn.jpg");
            vecOfImg.push_back("i003ra-fn.jpg");
            vecOfImg.push_back("i003sa-fn.jpg");
            vecOfImg.push_back("i004qa-mn.jpg");
            vecOfImg.push_back("i004ra-mn.jpg");
            vecOfImg.push_back("i005ra-fn.jpg");
            vecOfImg.push_back("i006ra-mn.jpg");
            vecOfImg.push_back("i007qa-fn.jpg");
            vecOfImg.push_back("i008ra-mn.jpg");
            vecOfImg.push_back("i009qa-mn.jpg");
            vecOfImg.push_back("i009ra-mn.jpg");
            vecOfImg.push_back("i009sa-mn.jpg");
            vecOfImg.push_back("i010qa-mn.jpg");
            vecOfImg.push_back("i010sa-mn.jpg");
            vecOfImg.push_back("i011qa-mn.jpg");
            vecOfImg.push_back("i011ra-mn.jpg");
            vecOfImg.push_back("i012ra-mn.jpg");
            vecOfImg.push_back("i012sa-mn.jpg");
            vecOfImg.push_back("i014qa-fn.jpg");

            for(int imgC = 0; imgC < vecOfImg.size(); imgC++){

                array2d<rgb_pixel> img;
                load_image(img, srcDir + vecOfImg.at(imgC));
                // Make the image larger so we can detect small faces.
                pyramid_up(img);

                // Now tell the face detector to give us a list of bounding boxes
                // around all the faces in the image.
                std::vector<rectangle> dets = detector(img);
                cout << "Number of faces detected: " << dets.size() << endl;

                // Now we will go ask the shape_predictor to tell us the pose of
                // each face we detected.
                std::vector<full_object_detection> shapes;
                for (unsigned long j = 0; j < dets.size(); ++j)
                {
                    full_object_detection shape = sp(img, dets[j]);
                    cout << "number of parts: "<< shape.num_parts() << endl;
                    cout << "pixel position of first part:  " << shape.part(0) << endl;
                    cout << "pixel position of second part: " << shape.part(1) << endl;

                    for(unsigned long i = 0; i < shape.num_parts(); i++){
                        draw_solid_circle(img, shape.part(i), 2, rgb_pixel(100,255,100));
                    }

                    save_jpeg(img, dstDir + vecOfImg.at(imgC));
                    // You get the idea, you can get all the face part locations if
                    // you want them.  Here we just store them in shapes so we can
                    // put them on the screen.
                    shapes.push_back(shape);
                }

            }

        }
        catch (exception& e)
        {
            cout << "\nexception thrown!" << endl;
            cout << e.what() << endl;
        }
    return a.exec();
}

What is difference between default and my training and testing if I used default dataset and examples? How did I can train shape predictor as shape_predictor_68_face_landmarks.dat?

回答1:

It is producing a 16.6MB DAT file because you are either using a few images to train, or not using the correct settings.

According to this Github issue, you are not using the optimal/default settings during the train process.

On your settings, the trainer has a very high oversampling amount (300), default is 20. You are also reducing the capacity of the model by increasing the regularization (making nu param smaller) and by using trees with smaller depths.

Your nu param: 0.05. Default is 0.1

Your tree depth: 2. Default is 4

By changing the params and training by trial and error, you will find the optimal accuracy with the smaller file size.

And keep in mind that each training process takes approximately 45 minutes and you need at least a 16GB RAM computer.

回答2:

The example dataset (/dlib-19.0/examples/faces/training_with_face_landmarks.xml) is way too small to train a high quality model. It isn't what the model that comes with dlib is trained on.

The examples use a small dataset to make the examples run fast. The point of all the examples is to explain the dlib API, not to be useful programs. They are just documentation. It's up to you to do something interesting with the dlib API.