I'm trying to speed up a computation for a neural simulator using the FFT.
The equation is:
(1) \sum(j=1 to N) (w(i - j) * s_NMDA[j])
where s_NMDA is a vector of length N and w is defined by:
(2) w(j) = tanh[1/(2 * sigma * p)] * exp(-abs(j) / (sigma * p)]
where sigma and p are constants.
(is there a better way to render equations on stackoverflow?)
The calculation has to be done for N neurons. Since (1) only depends on the absolute distance abs(i - j), it should be possible to compute this using the FFT (convolution theorem).
I've tried to implement this using FFTW but the results do not match with the expected results. I've never used FFTW before and now I'm not sure if my implementation is incorrect of if my assumptions about the convolution theorem are false.
void f_I_NMDA_FFT(
const double **states, // states[i][6] == s_NMDA[i]
const unsigned int numNeurons)
{
fftw_complex *distances, *sNMDAs, *convolution;
fftw_complex *distances_f, *sNMDAs_f, *convolution_f;
fftw_plan p, pinv;
const double scale = 1./numNeurons;
distances = (fftw_complex *)fftw_malloc(sizeof(fftw_complex) * numNeurons);
sNMDAs = (fftw_complex *)fftw_malloc(sizeof(fftw_complex) * numNeurons);
convolution = (fftw_complex *)fftw_malloc(sizeof(fftw_complex) * numNeurons);
distances_f = (fftw_complex *)fftw_malloc(sizeof(fftw_complex) * numNeurons);
sNMDAs_f = (fftw_complex *)fftw_malloc(sizeof(fftw_complex) * numNeurons);
convolution_f = (fftw_complex *)fftw_malloc(sizeof(fftw_complex) * numNeurons);
// fill input array for distances
for (unsigned int i = 0; i < numNeurons; ++i)
{
distances[i][0] = w(i);
distances[i][1] = 0;
}
// fill input array for sNMDAs
for (unsigned int i = 0; i < numNeurons; ++i)
{
sNMDAs[i][0] = states[i][6];
sNMDAs[i][1] = 0;
}
p = fftw_plan_dft_1d(numNeurons,
distances,
distances_f,
FFTW_FORWARD,
FFTW_ESTIMATE);
fftw_execute(p);
p = fftw_plan_dft_1d(numNeurons,
sNMDAs,
sNMDAs_f,
FFTW_FORWARD,
FFTW_ESTIMATE);
fftw_execute(p);
// convolution in frequency domain
for(unsigned int i = 0; i < numNeurons; ++i)
{
convolution_f[i][0] = (distances_f[i][0] * sNMDAs_f[i][0]
- distances_f[i][1] * sNMDAs_f[i][1]) * scale;
convolution_f[i][1] = (distances_f[i][0] * sNMDAs_f[i][1]
- distances_f[i][1] * sNMDAs_f[i][0]) * scale;
}
pinv = fftw_plan_dft_1d(numNeurons,
convolution_f,
convolution,
FFTW_FORWARD,
FFTW_ESTIMATE);
fftw_execute(pinv);
// compute and compare with expected result
for (unsigned int i = 0; i < numNeurons; ++i)
{
double expected = 0;
for (int j = 0; j < numNeurons; ++j)
{
expected += w(i - j) * states[j][6];
}
printf("i=%d, FFT: r%f, i%f : Expected: %f\n", i, convolution[i][0], convolution[i][1], expected);
}
fftw_destroy_plan(p);
fftw_destroy_plan(pinv);
fftw_free(distances), fftw_free(sNMDAs), fftw_free(convolution);
fftw_free(distances_f), fftw_free(sNMDAs_f), fftw_free(convolution_f);
Here's an example output for 20 neurons:
i=0, FFT: r0.042309, i0.000000 : Expected: 0.041504
i=1, FFT: r0.042389, i0.000000 : Expected: 0.042639
i=2, FFT: r0.042466, i0.000000 : Expected: 0.043633
i=3, FFT: r0.042543, i0.000000 : Expected: 0.044487
i=4, FFT: r0.041940, i0.000000 : Expected: 0.045203
i=5, FFT: r0.041334, i0.000000 : Expected: 0.045963
i=6, FFT: r0.041405, i0.000000 : Expected: 0.046585
i=7, FFT: r0.041472, i0.000000 : Expected: 0.047070
i=8, FFT: r0.041537, i0.000000 : Expected: 0.047419
i=9, FFT: r0.041600, i0.000000 : Expected: 0.047631
i=10, FFT: r0.041660, i0.000000 : Expected: 0.047708
i=11, FFT: r0.041717, i0.000000 : Expected: 0.047649
i=12, FFT: r0.041773, i0.000000 : Expected: 0.047454
i=13, FFT: r0.041826, i0.000000 : Expected: 0.047123
i=14, FFT: r0.041877, i0.000000 : Expected: 0.046656
i=15, FFT: r0.041926, i0.000000 : Expected: 0.046052
i=16, FFT: r0.041294, i0.000000 : Expected: 0.045310
i=17, FFT: r0.042059, i0.000000 : Expected: 0.044430
i=18, FFT: r0.042144, i0.000000 : Expected: 0.043412
i=19, FFT: r0.042228, i0.000000 : Expected: 0.042253
The results seem to be almost correct, but the error increases with the number of neurons. Also, the results seem to be more accurate for positions (i) that are very low or very high. What's going on here?
Update: As suggested by Oli Charlesworth, I implemented the algorithm in octave to see if it's a implementation or math problem:
input = [0.186775; 0.186775; 0.186775; 0.186775; 0.186775; 0; 0.186775; 0.186775; 0.186775; 0.186775];
function ret = _w(i)
ret = tanh(1 / (2* 1 * 32)) * exp(-abs(i) / (1 * 32));
end
for i = linspace(1, 10, 10)
expected = 0;
for j = linspace(1, 10, 10)
expected += _w(i-j) * input(j);
end
expected
end
distances = _w(transpose(linspace(0, 9, 10)));
input_f = fft(input);
distances_f = fft(distances);
convolution_f = input_f .* distances_f;
convolution = ifft(convolution_f)
Results:
expected = 0.022959
expected = 0.023506
expected = 0.023893
expected = 0.024121
expected = 0.024190
expected = 0.024100
expected = 0.024034
expected = 0.023808
expected = 0.023424
expected = 0.022880
convolution =
0.022959
0.023036
0.023111
0.023183
0.023253
0.022537
0.022627
0.022714
0.022798
0.022880
The results are very much the same. Therefore, there must be something wrong with my understanding of the convolution theorem / FFT.
To convolve 2 signals via FFT you generally need to do this:
In your code I see
FFTW_FORWARD
in all 3 FFTs. I'm guessing if that's not the problem, it's part of it. The last FFT should be "backward", not "forward".Also, I think you need "+" in the 2nd expression, not "-":
I finally solved the problem, thanks a lot to Alex and Oli Charlesworth for your suggestions!
Results:
I basically forgot to order the distances array in the correct way. If anyone is interested, I can provide a more detailed explanation later on.
Update: (Explanation)
Here's what my distances vector (for 5 neurons) looked liked initially:
On this vector, I applied a kernel, e.g.:
Since I was using cyclic convolution, the result for the first neuron _w(0) was:
0.0 * _w(2) + 0.1 * _w(1) + 0.1 * _w(0) + 0.1 * _w(1) + 0.0 * _w(2)
But this is incorrect, the result should be:
0.1 * _w(0) + 0.1 * _w(1) + 0.0 * _w(2) + 0.2 * _w(3) + 0.3 * _w(4)
To achieve that, I had to "mirror" my distances vector and add some padding to the kernel:
Now, if I apply the convolution, the results for i = [5:9] are exactly the results I was looking for, so I'll only have to discard [1:4] and I'm done :)