I'm new to opencl and I'm experiencing a weird issue with it! I have a reduction kernel and I repeat it several times! The problem is that when I profile the execution of kernel the elapsed time (queued->end) is almost same and a bit increasing but when I measure the elasped time within "C++" code the time for the execution of line "clEnqueueNDRangeKernel" increases with a rapid rate!! I have attached both the code and the output of profiling! :shock:
// execute the kernel
globalWorkSize[0] = this->reduction_NumBlocks * this->reduction_NumThreads;
localWorkSize[0] = this->reduction_NumThreads;
//Start Time
ttt.start();
clErrNum = clEnqueueNDRangeKernel(clCommandQueue, kernelReduction, 1, 0,
globalWorkSize, localWorkSize, 0, NULL, &timing_event);
// check if kernel execution generated an error
oclCheckError(clErrNum, CL_SUCCESS);
clFinish(clCommandQueue);
ttt.stop();
//Check Elapsed Time
clGetEventProfilingInfo(timing_event, CL_PROFILING_COMMAND_QUEUED,
sizeof(time_start), &time_start, NULL);
clGetEventProfilingInfo(timing_event, CL_PROFILING_COMMAND_END,
sizeof(time_end), &time_end, NULL);
cout<<"ElapseTime(Execute):"<<(time_end - time_start)/1000<<"us\tTTT:" <<ttt.getElapsedTimeInMicroSec()<<endl;
and this is the output:
GeForce GTX 550 Ti
Device Timer Resolution:1000ns
GpuExecutionTime:160us C++ElapsedTime:177
GpuExecutionTime:156us C++ElapsedTime:167
GpuExecutionTime:156us C++ElapsedTime:166
GpuExecutionTime:189us C++ElapsedTime:242
GpuExecutionTime:158us C++ElapsedTime:215
...
GpuExecutionTime:156us C++ElapsedTime:253
GpuExecutionTime:162us C++ElapsedTime:261
GpuExecutionTime:157us C++ElapsedTime:262
GpuExecutionTime:156us C++ElapsedTime:254
GpuExecutionTime:157us C++ElapsedTime:254
GpuExecutionTime:160us C++ElapsedTime:261
GpuExecutionTime:167us C++ElapsedTime:279
GpuExecutionTime:157us C++ElapsedTime:264
...
GpuExecutionTime:159us C++ElapsedTime:263
GpuExecutionTime:157us C++ElapsedTime:261
GpuExecutionTime:157us C++ElapsedTime:260
GpuExecutionTime:157us C++ElapsedTime:263
GpuExecutionTime:264us C++ElapsedTime:384
...
GpuExecutionTime:156us C++ElapsedTime:304
GpuExecutionTime:161us C++ElapsedTime:314
GpuExecutionTime:157us C++ElapsedTime:308
GpuExecutionTime:160us C++ElapsedTime:305
GpuExecutionTime:158us C++ElapsedTime:311
GpuExecutionTime:156us C++ElapsedTime:308
GpuExecutionTime:157us C++ElapsedTime:312
...
GpuExecutionTime:157us C++ElapsedTime:326
GpuExecutionTime:158us C++ElapsedTime:326
GpuExecutionTime:159us C++ElapsedTime:330
GpuExecutionTime:158us C++ElapsedTime:328
GpuExecutionTime:158us C++ElapsedTime:335
Any kind of help is appreciated.
P.S. The size of input and other related vairables are fixed!