I am using a nice FFT library I found online to see if I can write a pitch-detection program. So far, I have been able to successfully let the library do FFT calculation on a test audio signal containing a few sine waves including one at 440Hz (I'm using 16384 samples as the size and the sample rate at 44100Hz).
The FFT output looks like:
433.356Hz - Real: 590.644 - Imag: -27.9856 - MAG: 16529.5
436.047Hz - Real: 683.921 - Imag: 51.2798 - MAG: 35071.4
438.739Hz - Real: 4615.24 - Imag: 1170.8 - MAG: 5.40352e+006
441.431Hz - Real: -3861.97 - Imag: 2111.13 - MAG: 8.15315e+006
444.122Hz - Real: -653.75 - Imag: 341.107 - MAG: 222999
446.814Hz - Real: -564.629 - Imag: 186.592 - MAG: 105355
As you can see, the 441.431Hz and 438.739Hz bins both show equally high magnitude outputs (the right-most numbers following "MAG:"), so it's obvious that the target frequency 440Hz falls somewhere between. Increasing the resolution might be one way to close in, but that would add to the calculation time.
How do I calculate the exact frequency that falls between two frequency bins?
UPDATE:
I tried out Barry Quinn's "Second Estimator" discussed on the DSPGuru website and got excellent results. The following shows the result for 440Hz square wave - now I'm only off by 0.003Hz!
Here is the code I used. I simply adapted this example I found, which was for Swift. Thank you everyone for your very valuable input, this has been a great learning journey :)
Sinc interpolation can be used to accurately interpolate (or reconstruct) the spectrum between FFT result bins. A zero-padded FFT will produce a similar interpolated spectrum. You can use a high quality interpolator (such as a windowed Sinc kernel) with successive approximation to estimate the actual spectral peak to whatever resolution the S/N allows. This reconstruction might not work near the DC or Fs/2 FFT result bins unless you include the effects of the the spectrum's conjugate image in the interpolation kernel.
See: https://ccrma.stanford.edu/~jos/Interpolation/Ideal_Bandlimited_Sinc_Interpolation.html and https://en.wikipedia.org/wiki/Whittaker%E2%80%93Shannon_interpolation_formula for details about time domain reconstruction, but the same interpolation method works in either domain, frequency or time, for bandlimited or time limited signals respectively.
If you require a less accurate estimator with far less computational overhead, parabolic interpolation (and other similar curve fitting estimators) might work. See: https://www.dsprelated.com/freebooks/sasp/Quadratic_Interpolation_Spectral_Peaks.html and https://mgasior.web.cern.ch/mgasior/pap/FFT_resol_note.pdf for details for parabolic, and http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.555.2873&rep=rep1&type=pdf for other curve fitting peak estimators.
To calculate the "true" frequency, once I used parabola fit algorithm. It worked very well for my use case.
This is the way I followed in order to find the fundamental frequency:
For example, HPS says the fundamental (strongest) pitch is concentrated in bin
x
of your DFT; if binx
belongs to the peaky
, then parabola fit frequency is taken from the peaky
and that is the pitch you were looking for.If you are not looking for fundamental pitch, but exact frequency in any bin, just apply parabola fit for that bin.
Some code to get you started: