For my work I need to perform discrete fourier transformations (DFTs) on large images. In the current example I require a 3D FT for a 1921 x 512 x 512 image (along with 2D FFTs of 512 x 512 images). Right now, I am using the numpy package and the associated function np.fft.fftn(). The code snippet below exemplarily shows 2D and 3D FFT times on an equal-sized/slightly smaller 2D/3D random-number-generated grid in the following way:
import sys
import numpy as np
import time
tas = time.time()
a = np.random.rand(512, 512)
tab = time.time()
b = np.random.rand(100, 512, 512)
tbfa = time.time()
fa = np.fft.fft2(a)
tfafb = time.time()
fb = np.fft.fftn(b)
tfbe = time.time()
print "initializing 512 x 512 grid:", tab - tas
print "initializing 100 x 512 x 512 grid:", tbfa - tab
print "2D FFT on 512 x 512 grid:", tfafb - tbfa
print "3D FFT on 100 x 512 x 512 grid:", tfbe - tfafb
Output:
initializing 512 x 512 grid: 0.00305700302124
initializing 100 x 512 x 512 grid: 0.301637887955
2D FFT on 512 x 512 grid: 0.0122730731964
3D FFT on 100 x 512 x 512 grid: 3.88418793678
The problem that I have is that I will need this process quite often, so the time spent per image should be short. When testing on my own computer (middle-segment laptop, 2GB RAM allocated to virtual machine (--> therefore smaller test grid)), as you can see the 3D FFT takes ~ 5 s (order-of-magnitude). Now, at work, the machines are way better, cluster/grid-architecture systems and FFTs are much faster. In both cases the 2D ones finish quasi instantaneously.
However with 1921x512x512, np.fft.fftn() takes ~ 5 min. Since I guess scipy's implementation is not much faster and considering that on MATLAB FFTs of same-sized grids finish within ~ 5 s, my question is whether there is a method to speed the process up to or almost to MATLAB times. My knowledge about FFTs is limited, but apparently MATLAB uses the FFTW algorithm, which python does not. Any reasonable chance that with some pyFFTW package I get similar times? Also, 1921 seems an unlucky choice, having only 2 prime factors (17, 113), so I assume this also plays a role. On the other hand 512 is a well-suited power of two. Are MATLAB-like times achievable if possible also without padding up with zeros to 2048?
I'm asking because I'll have to use FFTs a lot (to an amount where such differences will be of huge influence!) and in case there is no possibility to reduce computation times in python, I'd have to switch to other, faster implementations.