I have a 3 D grid consisting of 3D blocks. I wish to calculate the individual thread indexes of each coordinates every time the kernel is being called. I have these parameters:
dim3 blocks_query(32,32,32);
dim3 threads_query(32,32,32);
kernel<<< blocks_query,threads_query >>>();
Inside the kernel, I wish to calculate the individual values of x,y and z coordinates for instance, x=0,y=0,z=0, x=0,y=0,z=1, x=0,y=0,z=2,....thanks in advance....
Individual thread indices (x, y, z coordinates) can be calculated inside the kernel as follows:
Keep in mind that the number of threads per block is limited by the GPU. So the block size you have created is invalid.
It equals to 32768 threads per block which is not supported by any of the current CUDA devices. Currently, maximum 1024 threads per block is supported for GPUs of Compute capability 2.0 and above while maximum 512 threads for older GPUs. You should reduce the block size otherwise the kernel would not launch. Another thing to be noted is that you are creating 3D grid which is supported only on CUDA GPUs of Compute 2.0 and above.
UPDATE
Suppose the dimensions of your 3D data are
xDim
,yDim
andzDim
, then a generic grid of thread blocks can be formed as follows:The above approach will create total number of threads equal to or greater than the total data size. The extra threads may cause invalid memory access. So perform bound checks inside the kernel. You can do this by passing
xDim
,yDim
andzDim
as kernel arguments and adding the following line inside the kernel: