Problems with OpenCL/OpenGL Texture interop / wind

To improve rendering quality I'm writing on a versatile separable downscaler in OpenCL 1.1.

The basic image (covering only a small part of the final image) is rendered into a very large framebuffer. Then its color-attached texture is downsampled and placed into another texture via OpenCL. Finally a screen-aligned quad gets rendered to show the result.

So far the idea. What do we have:

2 instances of the downscaler-kernel (it stores the results with coordinates exchanged (i.e. as (y,x) )
inputTexture (the color attachment of the rtt-framebuffer)
tempTexture, size: inputHeight x outputWidth, created with CL_MEM_READ_WRITE
outputTexture

Running kernel_instance_1( <otherParams>, inputTexture, tempTexture ) produces the desired result, but only in the very first frame - somehow the changes happening in the animation don't show up at all. As I get no errors (see below) I assume the kernel runs every frame, but the source texture content stays the same (which it doesn't, I also have a live-output of that texture).

Question: Do I have to call clCreateFromGLTexture2D() every time the contents of the framebuffer changed?

EDIT I just realized: the inputTexture is still attached to the framebuffer object's GL_COLOR_ATTACHMENT0 - may this be a problem? ENDEDIT

Running kernel_instance_2( <otherParams>, tempTexture, outputTexture ) yields no visible result, even with a barrier enqueued between both kernel calls. I.e. the outputTexture stays empty.

Question: Do I need to release and re-acquire the texture object tempTexture in between both kernel calls, so OpenCL sees the changes?

Just to see what OpenCL-calls are made, the following output was produced:

clCreateKernel( separable_X )
clRetainMemObject( separable_X::convolution )
clCreateKernel( separable_Y )
clRetainMemObject( separable_Y::convolution )
clCreateFromGLTexture2D( separable_X::dst + separable_y::src, texID=24, usage=temporary (source and target) )
clCreateFromGLTexture2D( separable_Y::dst, texID=18, usage=target )
clCreateFromGLTexture2D( separable_X::src, texID=22, usage=source )
clRetainMemObject( separable_X::dst )
clRetainMemObject( separable_Y::src )
clRetainMemObject( separable_Y::dst )
clRetainMemObject( clearEmpty::dst )
clEnqueueAcquireGLObjects( count=3 )
clEnqueueBarrier()
clSetKernelArg( separable_X::convert )
clSetKernelArg( separable_X::offset )
clSetKernelArg( separable_X::convolution )
clSetKernelArg( separable_X::dst )
clSetKernelArg( separable_X::src )
clEnqueueNDRangeKernel( separable_X, (1440, 1080, 0), waiting4 0 events )
clSetKernelArg( separable_Y::convert )
clSetKernelArg( separable_Y::offset )
clEnqueueBarrier()
clSetKernelArg( separable_Y::convolution )
clSetKernelArg( separable_Y::dst )
clSetKernelArg( separable_Y::src )
clEnqueueNDRangeKernel( separable_Y, (540, 1440, 0), waiting4 0 events )
clEnqueueBarrier()
clEnqueueReleaseGLObjects( count=3 )

If any call had produced an error, it would've been inside that output.

Another situation I get lots of times is that clEnqueueReleaseGLObjects() returns error code -9999, which somebody filed as "NVidia: Illegal read or write to a buffer".

Question: could it be that write_imagef() does not clamp the color value if any component exceeds 1.0f and the storage format is RGBA8? So that'd actually mean one must write write_imagef( texture, (int2)coord, clamp( color, 0.f, 1.f ) );...

Thanks a lot in advance - this gets me banging my head since nearly a week...

EDIT Some more infos that might be worth mentioning:

how I can distinguish the two instances?
There are 2 distinct __kernel functions with different names (separable_X and separable_Y) inside the program source, which both have the same body calling the separable()-function.

how do I sync between GL and CL?
- the function taking care of acquiring GL objects issues a glFinish() before calling clEnqueueAcquireGLObjects()
- I wait for completion of clEnqueueReleaseGLObjects() by using cl_events (likely to change in the future)

You're using glFinish before clEnqueueAcquireGLObjects which is correct, but you should also call clFinish AFTER clEnqueueReleaseGLObjects. Read section 9.8.6.2 of the OpenCL 1.1 specification carefully.

Also, to your other questions: