可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

As the title says, when I run my OpenCL kernel the entire screen stops redrawing (the image displayed on monitor remains the same until my program is done with calculations. This is true even in case I unplug it from my notebook and plug it back - allways the same image is displayed) and the computer does not seem to react to mouse moves either - the cursor stays in the same position.

I am not sure why this happens. Could it be a bug in my program, or is this a standard behaviour ?

While searching on Google I found this thread on AMD's forum and some people there suggested it's normal as the GPU can't refresh the screen, when it is busy with computations.

If this is true, is there still any way to work around this ?

My kernel computation can take up to several minutes and having my computer practically unusable for whole that time is really painful.

EDIT1: this is my current setup:

graphics card is ATI Mobility Radeon HD 5650 with 512 MB of memory and latest Catalyst beta driver from AMD's website
the graphics is switchable - Intel integrated/ATI dedicated card, but I have disabled switching in BIOS, because otherwise I could not get the driver working on Ubuntu.
the operating system is Ubuntu 12.10 (64-bits), but this happens on Windows 7 (64-bits) as well.
I have my monitor plugged in via HDMI (but the notebook screen freezes too, so this should not be a problem)

EDIT2: so after a day of playing with my code, I took the advices from your responses and changed my algorithm to something like this (in pseudo code):

for (cl_ulong chunk = 0; chunk < num_chunks; chunk += chunk_size)
{
  /* set kernel arguments that are different for each chunk */
  clSetKernelArg(/* ... */);

  /* schedule kernel for next execution */
  clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, &global_work_size, NULL, 0, NULL, NULL);

  /* read out the results from kernel and append them to output array on host */
  clEnqueueReadBuffer(cmd_queue, of_buf, CL_TRUE, 0, chunk_size, output + chunk, 0, NULL, NULL);
}

So now I split the whole workload on host and send it to GPU in chunks. For each chunk of data I enqueue a new kernel and the results that I get from it are appended to the output array at a correct offset.

Is this how you meant that the calculation should be divided ?

This seems like the way to remedy the freeze problem and even more now I am able to process data much larger than the available GPU memory, but I will yet have to make some good performance meassurements, to see what is the good chunk size...

回答1:

Whenever a GPU is running an OpenCL kernel it is completely dedicated to OpenCL. Some modern Nvidia GPUs are the exception, I think from the GeForce GTX 500 series onwards, which could run multiple kernels if those kernels did not use all available compute units.

Your solutions are to divide your calculations into multiple short kernel calls, which is the best all round solution since it will work even on single GPU machines, or to invest in a cheap GPU for driving your display.

If you are going to run long kernels on your GPUs then you must disable timeout detection and recovery for GPUs or make the timeout delay longer than the maximum kernel runtime (better as bugs can still be caught), see here for how to do this.

回答2:

Every time I have had a display freeze or "Display driver stopped responding and has recovered" it's been due to a bug. It can freeze the whole system and the only thing I can do is reset. Instead, now I develop on the CPU first. This never crashes my whole system. It's easier to debug this way as well since I can use printf. Once I got my code working bug free on the CPU I try it on the GPU.

回答3:

I am new to opencl and encountered a similar problem. I found a short calculation works OK, but a longer one freezes the mouse cursor. For my problem, Windows leaves a yellow triangle in the tray area, and puts a message in the event log about "Display driver stopped responding and has recovered". The solution I found is to break up the calculation into small parts that take no more than a couple of seconds each. These run back to back, yet apparently let the video driver in enough to keep it happy. If I set global_work_size to a value high enough to maximize throughput, the video response is painfully slow, but the driver restart/mouse freeze problem never occurs.