I want to implement Fast Fourier Transform in Java for chord recognition, but I don't really get it. It says that the number of samples should be a power of 2, so what should we do for a song that doesn't have number of samples equal to a power of 2? Also I would like to know about the STFT.
问题:
回答1:
You normally generate an STFT over a sliding window throughout your file. The size of the window is chosen to give a reasonable time period over which the characteristics of the sound do not change greatly. Typically a window might be around 10 ms, so if your sample rate is 44.1kHz for example, then you might use a window size N = 512, so that you get the required duration and a power of 2 in size. You then take successive chunks of size N samples through the file, and generate the FFT for each N point chunk. (Note: in most cases you actually want the magnitude of the FFT output, in order to get an estimate of the power spectrum.) For increased resolution the chunks can overlap, e.g. by 50%, but this increases the processing load of course. The end result is a succession of short term spectra, so in effect you have a 3D matrix (amplitude v frequency v time) which describes the contents of the sound in the frequency domain.
回答2:
Normally what you do is just pad the data with zeros to make it a power of two.