I am trying to understand how to use the math functions from the CUDA library. I use this documentation: https://docs.nvidia.com/cuda/cuda-math-api/
I am going to describe my problem, but I think this can be generalized with any function from the CUDA library.
I have this piece of code:
double diff[(Ni+2)*(Nj+2)];
.
.
.
for (i=1; i<=Ni; i++){
for (j=1; j<=Nj; j++){
diff[i*(Nj+2) + j] = fabs(value1[i*(Nj+2) + j] - value2[i*(Nj+2) + j]);
}
}
this works fine when I compile and run it on a CPU.
Then I want to run this code on a GPU and thus I create this kernel:
__global__ void deviceDiffKernel(int *in_1, int *in_2 , int *out, int N) {
int idx = blockIdx.x*blockDim.x + threadIdx.x + 1;
int idy = blockIdx.y*blockDim.y + threadIdx.y + 1;
out[idy*N + idx] = fabs(in_1[idy*N + idx] - in_2[idy*N + idx]);
}
here I cannot use the std::fabs function (comiler returns error):
error: calling a __host__ function("std::fabs ") from a __global__ function("deviceDeltaKernel") is not allowed
error: identifier "std::fabs " is undefined in device code
The documentation on the link above says to use this funtion:
__device__ double fabs(double x);
of course I cannot call it from the kernel like this:
out[idy*N + idx] = __device__ fabs(in_1[idy*N + idx] - in_2[idy*N + idx]);
or like this:
double out[idy*N + idx] = in_1[idy*N + idx] - in_2[idy*N + idx];
__device__ fabs(out[idy*N + idx]);
can somebody indicate how I can I use it then?
*This is quite general and stands the same for all the functions in the CUDA Math link above.