This question already has an answer here:
I want to measure the performance of different devices viz CPU and GPUs. This is my kernel code:
__kernel void dataParallel(__global int* A)
{
sleep(10);
A[0]=2;
A[1]=3;
A[2]=5;
int pnp;//pnp=probable next prime
int pprime;//previous prime
int i,j;
for(i=3;i<10;i++)
{
j=0;
pprime=A[i-1];
pnp=pprime+2;
while((j<i) && A[j]<=sqrt((float)pnp))
{
if(pnp%A[j]==0)
{
pnp+=2;
j=0;
}
j++;
}
A[i]=pnp;
}
}
However the sleep()
function doesnt work. I am getting the following error in buildlog:
<kernel>:4:2: warning: implicit declaration of function 'sleep' is invalid in C99
sleep(10);
builtins: link error: Linking globals named '__gpu_suld_1d_i8_trap': symbol multiply defined!
Is there any other way to implement the function. Also is there a way to record the time taken to execute this code snippet.
P.S. I have included #include <unistd.h>
in my host code.
You dont need to use sleep in your kernel to measure the execution time.
There are two ways to measure the time. 1. Use opencl inherent profiling look here: cl api
get timestamps in your hostcode and compare them before and after execution. example:
Where getTimeinMs() is a function that returns a double value of miliseconds: (windows specific, override with other implementation if you dont use windows)
Also you want to:
For Mac it would be (could work on Linux as well, not sure):