Whenever I'm plotting the values obtained by a programme using the cuFFT and comparing the results with that of Matlab, I'm getting the same shape of graphs and the values of maxima and minima are getting at the same points. However, the values resulting by the cuFFT are much greater than those resulting from Matlab. The Matlab code is
fs = 1000; % sample freq
D = [0:1:4]'; % pulse delay times
t = 0 : 1/fs : 4000/fs; % signal evaluation time
w = 0.5; % width of each pulse
yp = pulstran(t,D,'rectpuls',w);
filt = conj(fliplr(yp));
xx = fft(yp,1024).*fft(filt,1024);
xx = (abs(ifft(xx)));
and the CUDA code with the same input is like:
cufftExecC2C(plan, (cufftComplex *)d_signal, (cufftComplex *)d_signal, CUFFT_FORWARD);
cufftExecC2C(plan, (cufftComplex *)d_filter_signal, (cufftComplex *)d_filter_signal, CUFFT_FORWARD);
ComplexPointwiseMul<<<blocksPerGrid, threadsPerBlock>>>(d_signal, d_filter_signal, NX);
cufftExecC2C(plan, (cufftComplex *)d_signal, (cufftComplex *)d_signal, CUFFT_INVERSE);
The cuFFT performs also a 1024
points FFT with batch size of 2
.
With the scaling factor of NX=1024
, the values are not coming correct. Please tell what to do.
PERFORMANCE
I'm adding a further answer to compare the callback performance with the non-callback version of the same code for this particular case of IFFT scaling. The code I'm using is
For such large 1D arrays and simple processing (scaling), the timing on a Kepler K20c is the following
So, there is not much improvement. I expect that the improvement one sees is due to avoiding a separate kernel call in the non-callback case. For smaller 1D arrays, there is either no improvement or the non-callback case runs faster.
With the introduction of the cuFFT callback feature, the normalization required by the inverse FFT performed by the cuFFT can be embedded directly within the
cufftExecC2C
call by defining the normalization operation as a__device__
function.Besides the cuFFT User Guide, for the cuFFT callback features, see
CUDA Pro Tip: Use cuFFT Callbacks for Custom Data Processing
Below is an example of implementing the IFFT normalization by cuFFT callback.
EDIT
The "moment" the callback operation is performed is specified by
CUFFT_CB_ST_COMPLEX
in the call tocufftXtSetCallback
. Notice that you can then have load and store callbacks with the same cuFFT plan.This is a late answer to remove this question from the unanswered list.
You are not giving enough information to diagnose your problem, since you are missing to specify the way you are setting up the cuFFT plan. You are even not specifying whether you have exactly the same shape for the Matlab's and cuFFT's signals (so you have just a scaling) or you have approximately the same shape. However, let me make the following two observations:
yp
vector has4000
elements; opposite to thatm byfft(yp,1024)
, you are performing an FFT by truncating the signal to1024
elements;For the sake of convenience (it could be useful to other users), I'm reporting below a simple FFT-IFFT scheme which includes also the scaling performed by using the CUDA Thrust library.