I am reading a single pixel's depth from the framebuffer to implement picking. Originally my glReadPixels() was taking a very long time (5ms or so) and on nVidia it would even burn 100% CPU during that time. On Intel it was slow as well, but with idle CPU.
Since then, I used the PixelBufferObject functionality, PBO, to make the glReadPixels asynchronous and also double buffered using this well known example.
This approach works well, and let's me make a glReadPixels() call asynchronous but only if I read RGBA values. If I use the same PBO approach to read depth values, the glReadPixels() blocks again.
Reading RGBA: glReadPixels() takes 12µs.
Reading DEPTH: glReadPixels() takes 5ms.
I tried this on nVidia and Intel drivers. With different format/type combinations. I tried:
glReadPixels( srcx, srcy, 1, 1, GL_DEPTH_COMPONENT, GL_FLOAT, 0 );
and:
glReadPixels( srcx, srcy, 1, 1, GL_DEPTH_STENCIL, GL_UNSIGNED_INT_24_8, 0 );
and:
glReadPixels( srcx, srcy, 1, 1, GL_DEPTH_STENCIL, GL_FLOAT_32_UNSIGNED_INT_24_8_REV, 0 );
None of these would result in an asynchronous glReadPixels() call. But if I read RGBA values with the following call:
glReadPixels( srcx, srcy, 1, 1, GL_RGBA, GL_UNSIGNED_BYTE, 0 );
then the glReadPixels() returns immediately, thus no longer blocks.
Before reading the single pixel, I do:
glReadBuffer( GL_FRONT );
glBindBuffer( GL_PIXEL_PACK_BUFFER, pboid );
And I create the double buffered PBO with:
glGenBuffers( NUMPBO, pboids );
for ( int i=0; i<NUMPBO; ++i )
{
const int pboid = pboids[i];
glBindBuffer( GL_PIXEL_PACK_BUFFER, pboid );
glBufferData( GL_PIXEL_PACK_BUFFER, DATA_SIZE, 0, GL_STREAM_READ );
...
I create my framebuffer using SDL2 with depth size 24, stencil size 8, and the default double buffer.
I am using OpenGL Core Profile 3.3 on Ubuntu LTS.