Accelerating OpticalFlow Algorithm - OpenCV

2020-07-27 04:18发布

问题:

I am working on a project for estimating a UAV location using optical-flow algorithm. I am currently using cv::calcOpticalFlowFarneback for this purpose.
My hardware is an Odroid U3 that will finally be connected to the UAV flight controller.

The problem is that this method is really heavy for this hardware and I am looking for some other ways to optimize / accelerate it.

Things that I've already tried:

  • Reducing resolution to 320x240 or even 160x120.
  • Using OpenCV TBB (compiled using WITH_TBB=ON BUILD_TBB=ON and adding -ltbb).
  • Changing optical-flow parameters as suggested here

Adding the relevant part of my code:

int opticalFlow(){

    // capture from camera
    VideoCapture cap(0);
    if( !cap.isOpened() )
        return -1;

    // Set Resolution - The Default Resolution Is 640 x 480
    cap.set(CV_CAP_PROP_FRAME_WIDTH,WIDTH_RES);
    cap.set(CV_CAP_PROP_FRAME_HEIGHT,HEIGHT_RES);

    Mat flow, cflow, undistortFrame, processedFrame, origFrame, croppedFrame;
    UMat gray, prevgray, uflow;

    currLocation.x = 0;
    currLocation.y = 0;

    // for each frame calculate optical flow
    for(;;)
    {
        // take out frame- still distorted
        cap >> origFrame;

        // Convert to gray
        cvtColor(origFrame, processedFrame, COLOR_BGR2GRAY);

        // rotate image - perspective transformation
        rotateImage(processedFrame, gray, eulerFromSensors.roll, eulerFromSensors.pitch, 0, 0, 0, 1, cameraMatrix.at<double>(0,0),
        cameraMatrix.at<double>(0,2),cameraMatrix.at<double>(1,2));

        if( !prevgray.empty() )
        {
            // calculate flow
            calcOpticalFlowFarneback(prevgray, gray, uflow, 0.5, 3, 10, 3, 3, 1.2, 0);
            uflow.copyTo(flow);

            // get average
            calcAvgOpticalFlow(flow, 16, corners);

            /*
            Some other calculations
            .
            .
            .
            Updating currLocation struct
            */
        }
        //break conditions
        if(waitKey(1)>=0)
            break;
        if(end_run)
            break;
        std::swap(prevgray, gray);
    }
    return 0;
}

Notes:

  • I've ran callgrind and the bottleneck is as expected the calcOpticalFlowFarneback function.
  • I checked the CPU cores load while running the program, and it is not using all 4 cores heavily, only one core is on 100% at a given time (even with TBB):

回答1:

Optical flow estimation in general is a quiet time consuming operation. I would suggest to change the optical flow method.

The DualTVL1OpticalFlow is a more performant method in OpenCV you can use. If this method is still to slow the calcOpticalFlowPyrLK should be used. However this method is a sparse motion estimation method and do not directly return a dense motion field. To do so: initialize a set of points on a grid of your frame (e.g. grid step = 10) use these points to track them with the calcOpticalFlowPyrLK. The differenz between the tracked and inital points gives you the optical flow at each grid position. Finally you have to interpolate between the grid points. E.g. use a nearest neighbour or linear interpolation.



回答2:

First, I want to say thanks for this answer below that I used in order to build my final solution that I will explain with as many details as I can.

My solution is divided into two parts:

  1. Multithreading - Splitting each frame into 4 matrices, each quarter in a different matrix. Creating 4 threads and running each quarter processing in a different thread. I created the 4 quarters matrices such that there will be some (5%) overlap between them so that I won't lose the connecting between them (see figure below - yellow part is 55% from width and 55% from height).

    Q1 = cv::UMat(gray, Range(0, HEIGHT_RES*0.55), Range(0, WIDTH_RES*0.55));
    Q2 = cv::UMat(gray, Range(0, HEIGHT_RES*0.55), Range(WIDTH_RES*0.45, WIDTH_RES));
    Q3 = cv::UMat(gray, Range(0.45*HEIGHT_RES, HEIGHT_RES), Range(0, WIDTH_RES*0.55));
    Q4 = cv::UMat(gray, Range(0.45*HEIGHT_RES, HEIGHT_RES), Range(WIDTH_RES*0.45, WIDTH_RES));
    

    Each thread is doing the optical flow processing (part 2 below) on a quarter and the main loop is waiting for all threads to finish in order to collect the results and averaging.

  2. Using a sparse method - Using calcOpticalFlowPyrLK method within a selected ROI grid instead of using calcOpticalFlowFarneback. Using Lucas-Kanade sparse method instead of the Farneback dense method is consuming much less CPU time. In my case I created a grid with gridstep=10. This is the simple function for creating the grid:

    void createGrid(vector<cv::Point2f> &grid, int16_t wRes, int16_t hRes, int step){
    for (int i= 0; i < wRes ; i+=step)
        for (int j= 0; j < hRes; j+=step)
            grid.push_back(cv::Point2f(i,j));
    }
    

    Note that if the grid is constant during the whole run, it is better to only create it once before entering the main loop.

After implementing both parts, when running the program, all 4 cores of the Odroid U3 were constantly working on 60%-80% and the performance were accelerated.