Timed interval always evaluates to zero

2019-09-05 06:43发布

问题:

The code on the host is like this:

#include<time.h>
clock_t start,finish;
start=clock();
ret = clEnqueueNDRangeKernel(.........);
finish=clock();
double time = (double)(finish-start)/(double)(CLOCK_PER_SEC);

Why is finish - start always 0? Is it because of low resolution, or is there something wrong with my timer code?

回答1:

Enqueue-ing a kernel is very cheap, since the function call can return before the kernel is executed.

You could use the event generated by the clEnqueueNDRangeKernel to clWaitForEvents until the kernel is actually executed.



回答2:

clEnqueueNDRangeKernel only queues the kernel up to run. Unlike traditional C code that most folks are used to when debugging, OpenCL is not a serial process. To force your code to act in a serial manner you can either make them blocking (when available, see clEnqueueWriteBuffer and clEnqueueReadBuffer) or throw a clFinish() after each OpenCL command that uses a cl_command_queue. clFinish() forces all commands in the cl_command_queue to finish.

This allows you to use host timers easily.

Others have mentioned profiling events which are the intended method for profiling OpenCL calls.



回答3:

As others have already inferred, if you are using an unblocking clEnqueueNDRangeKernel (which is not explicit in your code), you are not measuring the kernel execution time because the enqueueing function returns without any guarantee that the kernel as finished execution (or even started it). You can pass a reference to a profiling event to the enqueue method and then inquire it about start and ending times. Using the cpp wrapper:

cl::Event timingEvent;
queue_0.enqueueNDRangeKernel(mx_kernel,cl::NullRange,global,local,NULL,&timingEvent);
queue_0.finish();//wait for kernel to be executed
timingEvent.getProfilingInfo(CL_PROFILING_COMMAND_START,&start_time);
timingEvent.getProfilingInfo(CL_PROFILING_COMMAND_END,&end_time);
unsigned long elapsed = (unsigned long)(end_time - start_time);

For this to work you have to enable profiling in your queue upon object construction:

cl::CommandQueue queue_0 = cl::CommandQueue(context, devices[0], CL_QUEUE_PROFILING_ENABLE);


标签: c++ c timer opencl