I'm having trouble using multiple GPUs with OpenCL/OpenGL interop. I'm trying to write an application which renders the result of an intensive computation. In the end it will run an optimization problem, and then, based on the result, render something to the screen. As a test case, I'm starting with the particle simulation example code from this course: http://web.engr.oregonstate.edu/~mjb/sig13/
The example code creates and OpenGL context, then creates a OpenCL context that shares the state, using the cl_khr_gl_sharing extension. Everything works fine when I use a single GPU. Creating a context looks like this:
3. create an opencl context based on the opengl context:
cl_context_properties props[ ] =
{
CL_GL_CONTEXT_KHR, (cl_context_properties) glXGetCurrentContext( ),
CL_GLX_DISPLAY_KHR, (cl_context_properties) glXGetCurrentDisplay( ),
CL_CONTEXT_PLATFORM, (cl_context_properties) Platform,
0
};
cl_context Context = clCreateContext( props, 1, Device, NULL, NULL, &status );
if( status != CL_SUCCESS)
{
PrintCLError( status, "clCreateContext: " );
exit(1);
}
Later on, the example creates shared CL/GL buffers with clCreateFromGLBuffer.
Now, I would like to create a context from two GPU devices:
cl_context Context = clCreateContext( props, 2, Device, NULL, NULL, &status );
I've successfully opened the devices, and can query that they both support cl_khr_gl_sharing, and both work individually. However, when attempting to create the context as above, I get
CL_INVALID_OPERATION
Which is an error code added by the cl_khr_gl_sharing extension. In the extension description (linked above) it says
CL_INVALID_OPERATION if a context or share group object was specified for one of CGL, EGL, GLX, or WGL and any of the following conditions hold:
- The OpenGL implementation does not support the window-system binding API for which a context or share group objects was specified.
- More than one of the attributes CL_CGL_SHAREGROUP_KHR, CL_EGL_DISPLAY_KHR, CL_GLX_DISPLAY_KHR, and CL_WGL_HDC_KHR is set to a non-default value.
- Both of the attributes CL_CGL_SHAREGROUP_KHR and CL_GL_CONTEXT_KHR are set to non-default values.
- Any of the devices specified in the argument cannot support OpenCL objects which share the data store of an OpenGL object, as described in section 9.12."
That description doesn't seem to fit any of my cases exactly. Is it not possible to do OpenCL/OpenGL interop with multiple GPUs? Or is it that I have heterogeneous hardware? I printed out a few parameters from my enumerated devices. I've just taken two random GPUs that I could get my hands on.
PlatformID: 18483216
Num Devices: 2
-------- Device 00 ---------
CL_DEVICE_NAME: GeForce GTX 285
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DEVICE_VERSION: OpenCL 1.0 CUDA
CL_DRIVER_VERSION: 304.88
CL_DEVICE_MAX_COMPUTE_UNITS: 30
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1476
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
-------- Device 01 ---------
CL_DEVICE_NAME: Quadro FX 580
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DEVICE_VERSION: OpenCL 1.0 CUDA
CL_DRIVER_VERSION: 304.88
CL_DEVICE_MAX_COMPUTE_UNITS: 4
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1125
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
cl_khr_gl_sharing is supported on dev 0.
cl_khr_gl_sharing is supported on dev 1.
Note that if I create the context without the interop portion (such that the props array looks like below) then it successfully creates the context, but obviously cannot share buffers with the OpenGL side of the application.
cl_context_properties props[ ] =
{
CL_CONTEXT_PLATFORM, (cl_context_properties) Platform,
0
};
Several related Questions and Examples
OpenCL specific, but relevant to this question:
"If you enqueue a write to the buffer on queueA(deviceA) then OpenCL will use that device to do the write. However, if you then use the buffer on queueB(deviceB) in the same context, OpenCL will recognize that deviceA has the most recent data and will move it over to deviceB before using it. In short, as long as you use events to ensure that no two devices are trying to access the same memory object at the same time, OpenCL will make sure that each use of the memory object has the most recent data, regardless of which device last used it."
I assume when you take OpenGL out of the equation sharing memory between gpus works as expected?
When you call these two lines:
CL_GL_CONTEXT_KHR, (cl_context_properties) glXGetCurrentContext( ), CL_GLX_DISPLAY_KHR, (cl_context_properties) glXGetCurrentDisplay( ),
the calls need to come from inside a new thread with a new OpenGL context. You can usually only associate one OpenCL context with one OpenGL context for one device at a time per thread.