Non blocking glReadPixels of depth values with PBO

I am reading a single pixel's depth from the framebuffer to implement picking. Originally my glReadPixels() was taking a very long time (5ms or so) and on nVidia it would even burn 100% CPU during that time. On Intel it was slow as well, but with idle CPU.

Since then, I used the PixelBufferObject functionality, PBO, to make the glReadPixels asynchronous and also double buffered using this well known example.

This approach works well, and let's me make a glReadPixels() call asynchronous but only if I read RGBA values. If I use the same PBO approach to read depth values, the glReadPixels() blocks again.

Reading RGBA: glReadPixels() takes 12µs.

Reading DEPTH: glReadPixels() takes 5ms.

I tried this on nVidia and Intel drivers. With different format/type combinations. I tried:

glReadPixels( srcx, srcy, 1, 1, GL_DEPTH_COMPONENT, GL_FLOAT, 0 );

and:

glReadPixels( srcx, srcy, 1, 1, GL_DEPTH_STENCIL, GL_UNSIGNED_INT_24_8, 0 );

and:

glReadPixels( srcx, srcy, 1, 1, GL_DEPTH_STENCIL, GL_FLOAT_32_UNSIGNED_INT_24_8_REV, 0 );

None of these would result in an asynchronous glReadPixels() call. But if I read RGBA values with the following call:

glReadPixels( srcx, srcy, 1, 1, GL_RGBA, GL_UNSIGNED_BYTE, 0 );

then the glReadPixels() returns immediately, thus no longer blocks.

Before reading the single pixel, I do:

glReadBuffer( GL_FRONT );
glBindBuffer( GL_PIXEL_PACK_BUFFER, pboid );

And I create the double buffered PBO with:

glGenBuffers( NUMPBO, pboids );
for ( int i=0; i<NUMPBO; ++i )
{
    const int pboid = pboids[i];
    glBindBuffer( GL_PIXEL_PACK_BUFFER, pboid );
    glBufferData( GL_PIXEL_PACK_BUFFER, DATA_SIZE, 0, GL_STREAM_READ );
    ...

I create my framebuffer using SDL2 with depth size 24, stencil size 8, and the default double buffer.

I am using OpenGL Core Profile 3.3 on Ubuntu LTS.

I don't actually read the pixel depth (via glMapBuffer) until the next frame so there is no synchronization going on. The glReadPixel should have triggered an async operation and return immediately (as it does for RGBA). But it does not, for reading depth.

That would require there to be two depth buffers. But there aren't. Multi-buffering refers to the number of color buffers, since those are what actually get displayed. Implementations pretty much never give you multiple depth buffers.

In order to service a read from the depth buffer, that read has to happen before "the next frame" takes place. So there would need to be synchronization.

Generally speaking, it's best to read from your own images. That way, you have complete control over things like format, when they get reused, and the like, so that you can control issues of synchronization. If you need two depth buffers so that you can read from one while using the other, then you need to create that.

And FYI: reading from the default framebuffer at all is dubious due to pixel ownership issues and such. But reading from the front buffer is pretty much always the wrong thing.