Read pixel data from default framebuffer in OpenGL

2020-07-27 12:31发布

My goal is to read the contents of the default OpenGL framebuffer and store the pixel data in a cv::Mat. Apparently there are two different ways of achieving this:

1) Synchronous: use FBO and glRealPixels

cv::Mat a = cv::Mat::zeros(cv::Size(1920, 1080), CV_8UC3);
glReadPixels(0, 0, 1920, 1080, GL_BGR, GL_UNSIGNED_BYTE, a.data);

2) Asynchronous: use PBO and glReadPixels

cv::Mat b = cv::Mat::zeros(cv::Size(1920, 1080), CV_8UC3);
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo_userImage);
    glReadPixels(0, 0, 1920, 1080, GL_BGR, GL_UNSIGNED_BYTE, 0);
    unsigned char* ptr = static_cast<unsigned char*>(glMapBuffer(GL_PIXEL_PACK_BUFFER, GL_READ_ONLY));
    std::copy(ptr, ptr + 1920 * 1080 * 3 * sizeof(unsigned char), b.data);
    glUnmapBuffer(GL_PIXEL_PACK_BUFFER);
glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);

From all the information I collected on this topic, the asynchronous version 2) should be much faster. However, comparing the elapsed time for both versions yields that the differences are often times minimal, and sometimes version 1) events outperforms the PBO variant.

For performance checks, I've inserted the following code (based on this answer):

std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();
....
std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
std::cout << "Time difference = " << std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count() << std::endl;

I've also experimented with the usage hint when creating the PBO: I didn't find much of difference between GL_DYNAMIC_COPY and GL_STREAM_READ here.

I'd be happy for suggestions how to increase the speed of this pixel read operation from the framebuffer even further.

标签: c++ opengl
1条回答
唯我独甜
2楼-- · 2020-07-27 12:56

Your second version is not asynchronous at all, since you're mapping the buffer immediately after triggering the copy. The map call will then block until the contents of the buffer are available, effectively becoming synchronous.

Or: depending on the driver, it will block when actually reading from it. In other words the driver may implement the mapping in such a way that it causes a pagefault, and a subsequent synchronization. It doesn't really matter in your case, since you are still accessing that data straight away due to the std::copy.

The proper way of doing this is by using sync objects and fences.

Keep your PBO setup, but after issuing the glReadPixels into a PBO, insert a sync object into the stream via glFenceSync. Then, some time later, poll for that fence sync object to be complete (or just wait for it altogether) via glClientWaitSync.

If glClientWaitSync returns that the commands before the fence are complete, you can now read from the buffer without an expensive CPU/GPU sync. (If the driver is particularly stupid and didn't already move the buffer contents into mappable addresses, in spite of your usage hints on the PBO, you can use another thread to perform the map. glGetBufferSubData can be therefore cheaper, as the data doesn't need to be in a mappable range.)


If you need to do this on a frame-by-frame basis, you'll notice that it's very likely that you'll need more than one PBO, that is, have a small pool of them. This is because at the next frame the readback of the previous frame's data is not complete yet and the corresponding fence not signalled. (Yes, GPUs are massively pipelined these days, and they will be some frames behind your submission queue).

查看更多
登录 后发表回答