So I am aware that a convolution by FFT has a lower computational complexity than a convolution in real space. But what are the downsides of an FFT convolution?
Does the kernel size always have to match the image size, or are there functions that take care of this, for example in pythons numpy and scipy packages? And what about anti-aliasing effects?
FFT convolutions are based on the convolution theorem, which states that givem two functions
f
andg
, ifFd()
andFi()
denote the direct and inverse Fourier transform, and*
and.
convolution and multiplication, then:To apply this to a signal
f
and a kernelg
, there are some things you need to take care of:f
andg
have to be of the same size for the multiplication step to be possible, so you need to zero-pad the kernel.If your signal and kernel sizes are
f_l
andg_l
, doing a straightforward convolution in time domain requiresg_l * (f_l - g_l + 1)
multiplications and(g_l - 1) * (f_l - g_l + 1)
additions.For the FFT approach, you have to do 3 FFTs of size at least
f_l + g_l
, as well asf_l + g_l
multiplications.For large sizes of both
f
andg
, the FFT is clearly superior with itsn*log(n)
complexity. For small kernels, the direct approach may be faster.scipy.signal
has bothconvolve
andfftconvolve
methods for you to play around. Andfftconvolve
handles all the padding described above transparently for you.While fast convolution has better "big O" complexity than direct form convolution; there are a few drawbacks or caveats. I did some thinking about this topic for an article I wrote a while back.
Better "big O" complexity is not always better. Direct form convolution can be faster than using FFTs for filters smaller than a certain size. The exact size depends on the platform and implementations used. The crossover point is usually in the 10-40 coefficient range.
Latency. Fast convolution is inherently a blockwise algorithm. Queueing up hundreds or thousands of samples at a time before transforming them may be unacceptable for some real-time applications.
Implementation complexity. Direct form is simpler in terms of the memory, code space and in the theoretical background of the writer/maintainer.
On a fixed point DSP platform (not a general purpose CPU): the limited word size considerations of fixed-point FFT make large fixed point FFTs nearly useless. At the other end of the size spectrum, these chips have specialized MAC intstructions that are well designed for performing direct form FIR computation, increasing the range over which te O(N^2) direct form is faster than O(NlogN). These factors tend to create a limited "sweet spot" where fixed point FFTs are useful for Fast Convolution.