I'm trying to compute batch 1D FFTs using cufftPlanMany
. The data set comes from a 3D field, stored in a 1D array, where I want to compute 1D FFTs in the x
and y
direction. The data is stored as shown in the figure below; continuous in x
then y
then z
.
Doing batch FFTs in the x
-direction is (I believe) straighforward; with input stride=1
, distance=nx
and batch=ny * nz
, it computes the FFTs over elements {0,1,2,3}
, {4,5,6,7}
, ...
, {28,29,30,31}
. However, I can't think of a way to achieve the same for the FFTs in the y
-direction. A batch for each xy
plane is again straightforward (input stride=nx
, dist=1
, batch=nx
results in FFTs over {0,4,8,12}
, {1,5,9,13}
, etc.). But with batch=nx * nz
, going from {3,7,11,15}
to {16,20,24,28}
, the distance is larger than 1
. Can this somehow be done with cufftPlanMany?
I think that the short answer to your question (possibility of using a single
cufftPlanMany
to perform 1D FFTs of the columns of a 3D matrix) is NO.Indeed, transformations performed according to
cufftPlanMany
, that you call likemust obey the Advanced Data Layout. In particular, 1D FFTs are worked out according to the following layout
where
b
addresses theb
-th signal andistride
is the distance between two consecutive items in the same signal. If the 3D matrix has dimensionsM * N * Q
and if you want to perform 1D transforms along the columns, then the distance between two consecutive elements will beM
, while the distance between two consecutive signals will be1
. Furthermore, the number of batched executions must be set equal toM
. With those parameters, you are able to cover only one slice of the 3D matrix. Indeed, if you try increasingM
, then the cuFFT will start trying to compute new column-wise FFTs starting from the second row. The only solution to this problem is an iterative call tocufftExecC2C
to cover all theQ
slices.For the record, the following code provides a fully worked example on how performing 1D FFTs of the columns of a 3D matrix.
The situation is different for the case when you want to perform 1D transforms of the rows. In that case, the distance between two consecutive elements is
1
, while the distance between two consecutive signals isM
. This allows you to set a number ofN * Q
transformations and then invokingcufftExecC2C
only one time. For the record, the code below provides a full example of 1D transformations of the rows of a 3D matrix.I guess, idist=nx*nz could also jump a whole plane and batch=nz would then cover one yx plane. The decision should be made according to whether nx or nz is larger.