I cannot understand the CUDA documentation in orde

I am trying to understand how to use the math functions from the CUDA library. I use this documentation: https://docs.nvidia.com/cuda/cuda-math-api/

I am going to describe my problem, but I think this can be generalized with any function from the CUDA library.

I have this piece of code:

   double diff[(Ni+2)*(Nj+2)];

   .
   .
   .

   for (i=1; i<=Ni; i++){
        for (j=1; j<=Nj; j++){
            diff[i*(Nj+2) + j] = fabs(value1[i*(Nj+2) + j] - value2[i*(Nj+2) + j]);
        }
    }

this works fine when I compile and run it on a CPU.

Then I want to run this code on a GPU and thus I create this kernel:

__global__ void deviceDiffKernel(int *in_1, int *in_2 , int *out, int N) {

    int idx = blockIdx.x*blockDim.x + threadIdx.x + 1;
    int idy = blockIdx.y*blockDim.y + threadIdx.y + 1;

    out[idy*N + idx] = fabs(in_1[idy*N + idx] - in_2[idy*N + idx]);

}

here I cannot use the std::fabs function (comiler returns error):

error: calling a __host__ function("std::fabs ") from a __global__ function("deviceDeltaKernel") is not allowed

error: identifier "std::fabs " is undefined in device code

The documentation on the link above says to use this funtion:

__device__ double fabs(double x);

of course I cannot call it from the kernel like this:

out[idy*N + idx] = __device__ fabs(in_1[idy*N + idx] - in_2[idy*N + idx]);

or like this:

double out[idy*N + idx] = in_1[idy*N + idx] - in_2[idy*N + idx];
__device__ fabs(out[idy*N + idx]);

can somebody indicate how I can I use it then?

*This is quite general and stands the same for all the functions in the CUDA Math link above.

The kernel will compile if you cast the argument to the type indicated in the CUDA math API documentation:

#include <math.h>
__global__ void deviceDiffKernel(int *in_1, int *in_2 , int *out, int N) {

    int idx = blockIdx.x*blockDim.x + threadIdx.x + 1;
    int idy = blockIdx.y*blockDim.y + threadIdx.y + 1;

    out[idy*N + idx] = fabs((double)(in_1[idy*N + idx] - in_2[idy*N + idx]));

}

Your argument is an integer type. The compiler looks for the closest matching function prototype. Since the CUDA math API does not provide __device__ double fabs(int);, the compiler chooses a matching prototype from std and that isn't usable in device code.

As a general rule for these types of questions, regardless of which function you are using from the CUDA Math API, start by making sure all types (arguments, return value) match the types given for the function prototype in the math API documentation. Also note that the math API often has different functions available for floating-point operations on float vs. double type. Some math API function may even support a mixture of argument types, but its still necessary to get a "match" to get the compiler to identify the correct function to use.

Regarding your other usages, drop the __device__ decorator. That is not used when calling the function.