To improve rendering quality I'm writing on a versatile separable downscaler in OpenCL 1.1.
The basic image (covering only a small part of the final image) is rendered into a very large framebuffer. Then its color-attached texture is downsampled and placed into another texture via OpenCL. Finally a screen-aligned quad gets rendered to show the result.
So far the idea. What do we have:
- 2 instances of the downscaler-kernel (it stores the results with coordinates exchanged (i.e. as (y,x) )
- inputTexture (the color attachment of the rtt-framebuffer)
- tempTexture, size: inputHeight x outputWidth, created with CL_MEM_READ_WRITE
- outputTexture
Running kernel_instance_1( <otherParams>, inputTexture, tempTexture )
produces the desired result, but only in the very first frame - somehow the changes happening in the animation don't show up at all. As I get no errors (see below) I assume the kernel runs every frame, but the source texture content stays the same (which it doesn't, I also have a live-output of that texture).
Question:
Do I have to call clCreateFromGLTexture2D() every time the contents of the framebuffer changed?
EDIT I just realized: the inputTexture is still attached to the framebuffer object's GL_COLOR_ATTACHMENT0
- may this be a problem? ENDEDIT
Running kernel_instance_2( <otherParams>, tempTexture, outputTexture )
yields no visible result, even with a barrier enqueued between both kernel calls. I.e. the outputTexture stays empty.
Question:
Do I need to release and re-acquire the texture object tempTexture
in between both kernel calls, so OpenCL sees the changes?
Just to see what OpenCL-calls are made, the following output was produced:
clCreateKernel( separable_X )
clRetainMemObject( separable_X::convolution )
clCreateKernel( separable_Y )
clRetainMemObject( separable_Y::convolution )
clCreateFromGLTexture2D( separable_X::dst + separable_y::src, texID=24, usage=temporary (source and target) )
clCreateFromGLTexture2D( separable_Y::dst, texID=18, usage=target )
clCreateFromGLTexture2D( separable_X::src, texID=22, usage=source )
clRetainMemObject( separable_X::dst )
clRetainMemObject( separable_Y::src )
clRetainMemObject( separable_Y::dst )
clRetainMemObject( clearEmpty::dst )
clEnqueueAcquireGLObjects( count=3 )
clEnqueueBarrier()
clSetKernelArg( separable_X::convert )
clSetKernelArg( separable_X::offset )
clSetKernelArg( separable_X::convolution )
clSetKernelArg( separable_X::dst )
clSetKernelArg( separable_X::src )
clEnqueueNDRangeKernel( separable_X, (1440, 1080, 0), waiting4 0 events )
clSetKernelArg( separable_Y::convert )
clSetKernelArg( separable_Y::offset )
clEnqueueBarrier()
clSetKernelArg( separable_Y::convolution )
clSetKernelArg( separable_Y::dst )
clSetKernelArg( separable_Y::src )
clEnqueueNDRangeKernel( separable_Y, (540, 1440, 0), waiting4 0 events )
clEnqueueBarrier()
clEnqueueReleaseGLObjects( count=3 )
If any call had produced an error, it would've been inside that output.
Another situation I get lots of times is that clEnqueueReleaseGLObjects()
returns error code -9999, which somebody filed as "NVidia: Illegal read or write to a buffer".
Question:
could it be that write_imagef()
does not clamp the color value if any component exceeds 1.0f and the storage format is RGBA8? So that'd actually mean one must write write_imagef( texture, (int2)coord, clamp( color, 0.f, 1.f ) );
...
Thanks a lot in advance - this gets me banging my head since nearly a week...
EDIT Some more infos that might be worth mentioning:
how I can distinguish the two instances?
There are 2 distinct __kernel
functions with different names (separable_X
and separable_Y
) inside the program source, which both have the same body calling the separable()
-function.
how do I sync between GL and CL?
- the function taking care of acquiring GL objects issues a glFinish()
before calling clEnqueueAcquireGLObjects()
- I wait for completion of clEnqueueReleaseGLObjects()
by using cl_events (likely to change in the future)