I want to measure the performance (read runtime) of my kernel code on various devices viz CPU and GPUs. The kernel code that I wrote is:
__kernel void dataParallel(__global int* A)
{
sleep(10);
A[0]=2;
A[1]=3;
A[2]=5;
int pnp;//pnp=probable next prime
int pprime;//previous prime
int i,j;
for(i=3;i<500;i++)
{
j=0;
pprime=A[i-1];
pnp=pprime+2;
while((j<i) && A[j]<=sqrt((float)pnp))
{
if(pnp%A[j]==0)
{
pnp+=2;
j=0;
}
j++;
}
A[i]=pnp;
}
}
However I have been told that it is not possible to use sleep() in the kernel code. If that is true then can someone give the reason and if it isn't please tell the way to implement the same.
Also, as I said that I wish to compare the performance of my CPU and the GPUs, one of the ways to achieve that is by computing the run time of the kernel code on the various devices while if there was another way by which I could get the code to start executing on all the devices at the same time then I would just have to list the corresponding end time of execution and that would serve the purpose as well! Is it possible?
Hardware Details:
GPU: AMD FirePro W7000, NVIDIA TESLA C2075 CPU: Intel(R) XEON(R) CPU X5660 @ 2.80GHZn