As the title says, when I run my OpenCL
kernel the entire screen stops redrawing (the image displayed on monitor remains the same until my program is done with calculations. This is true even in case I unplug it from my notebook and plug it back - allways the same image is displayed) and the computer does not seem to react to mouse moves either - the cursor stays in the same position.
I am not sure why this happens. Could it be a bug in my program, or is this a standard behaviour ?
While searching on Google I found this thread on AMD's forum and some people there suggested it's normal as the GPU can't refresh the screen, when it is busy with computations.
If this is true, is there still any way to work around this ?
My kernel computation can take up to several minutes and having my computer practically unusable for whole that time is really painful.
EDIT1: this is my current setup:
- graphics card is ATI Mobility Radeon HD 5650 with 512 MB of memory and latest Catalyst beta driver from AMD's website
- the graphics is switchable - Intel integrated/ATI dedicated card, but I have disabled switching in BIOS, because otherwise I could not get the driver working on Ubuntu.
- the operating system is Ubuntu 12.10 (64-bits), but this happens on Windows 7 (64-bits) as well.
- I have my monitor plugged in via HDMI (but the notebook screen freezes too, so this should not be a problem)
EDIT2: so after a day of playing with my code, I took the advices from your responses and changed my algorithm to something like this (in pseudo code):
for (cl_ulong chunk = 0; chunk < num_chunks; chunk += chunk_size)
{
/* set kernel arguments that are different for each chunk */
clSetKernelArg(/* ... */);
/* schedule kernel for next execution */
clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, &global_work_size, NULL, 0, NULL, NULL);
/* read out the results from kernel and append them to output array on host */
clEnqueueReadBuffer(cmd_queue, of_buf, CL_TRUE, 0, chunk_size, output + chunk, 0, NULL, NULL);
}
So now I split the whole workload on host and send it to GPU in chunks. For each chunk of data I enqueue a new kernel and the results that I get from it are appended to the output array at a correct offset.
Is this how you meant that the calculation should be divided ?
This seems like the way to remedy the freeze problem and even more now I am able to process data much larger than the available GPU memory, but I will yet have to make some good performance meassurements, to see what is the good chunk size...