Using FFTW I have been computing the FFT of normalized .wav file data. I am a bit confused as to how I should normalise the FFT output, however. I have been using the method which seemed obvious to me, which is simply to divide by the highest FFT magnitude. I have seen division by 1/N and N/2 recommended, however (where I assume N = FFT size). How do these work as normalisation factors? There doesn't seem to me to be an intuitive relation between these factors and the actual data - so what am I missing?
Huge thanks in advance for any help on this.
Surprisingly there is no single agreed definition for the FFT and the IFFT, at least as far as scaling is concerned, but for most implementations (including FFTW) you need to scale by 1/N in the forward direction, and there is no scaling in the reverse direction.
Usually (for performance reasons) you will want to lump this scaling factor in with any other corrections, such as your A/D gain, window gain correction factor, etc, so that you just have one combined scale factor to apply to your FFT output bins. Alternatively if you are just generating, say, a power spectrum in dB then you can make the correction a single dB value that you subtract from your power spectrum bins.
It's often useful with FFTs to refer to Parseval's Theorem, and other comparisons that require a meaningful magnitude. Furthermore, the height of any individual peak isn't very useful, and depends, for example, on the window that used in calculating the FFT, as this can shorten and broaden the peak. For these reason, I'd recommend against normalizing by the largest peak, as you then lose any easy connection to meaningful magnitudes, and easy comparison between data sets, etc.