可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have been looking through this fantastic article: http://blogs.zynaptiq.com/bernsee/pitch-shifting-using-the-ft/
While being fantastic, it is extremely hard and heavy going. This material is really stretching me.
I have extracted the maths from Stefan's code module that calculates the exact frequency for a given bin. But I don't understand the last calculation. Can someone explain to me the mathematical construction at the end?
Before digging into the code, let me set the scene:
Let's say we set fftFrameSize = 1024, so we are dealing with 512+1 bins
As an example, Bin[1]'s ideal frequency fits a single wave in the frame. At a sample rate of 40KHz, tOneFrame = 1024/40K seconds = 1/40s, so Bin[1] would ideally be collecting a 40Hz signal.
Setting osamp (overSample) = 4, we progress along our input signal in steps of 256. So the first analysis examines bytes zero through 1023, then 256 through 1279, etc. Note each float gets processed 4 times.
...
void calcBins(
long fftFrameSize,
long osamp,
float sampleRate,
float * floats,
BIN * bins
)
{
/* initialize our static arrays */
static float gFFTworksp[2*MAX_FRAME_LENGTH];
static float gLastPhase[MAX_FRAME_LENGTH/2+1];
static long gInit = 0;
if (! gInit)
{
memset(gFFTworksp, 0, 2*MAX_FRAME_LENGTH*sizeof(float));
memset(gLastPhase, 0, (MAX_FRAME_LENGTH/2+1)*sizeof(float));
gInit = 1;
}
/* do windowing and re,im interleave */
for (long k = 0; k < fftFrameSize; k++)
{
double window = -.5*cos(2.*M_PI*(double)k/(double)fftFrameSize)+.5;
gFFTworksp[2*k] = floats[k] * window;
printf("sinValue: %f", gFFTworksp[2*k]);
gFFTworksp[2*k+1] = 0.;
}
/* do transform */
smbFft(gFFTworksp, fftFrameSize, -1);
printf("\n");
/* this is the analysis step */
for (long k = 0; k <= fftFrameSize/2; k++)
{
/* de-interlace FFT buffer */
double real = gFFTworksp[2*k];
double imag = gFFTworksp[2*k+1];
/* compute magnitude and phase */
double magn = 2.*sqrt(real*real + imag*imag);
double phase = atan2(imag,real);
/* compute phase difference */
double phaseDiff = phase - gLastPhase[k];
gLastPhase[k] = phase;
/* subtract expected phase difference */
double binPhaseOffset = M_TWOPI * (double)k / (double)osamp;
double deltaPhase = phaseDiff - binPhaseOffset;
/* map delta phase into [-Pi, Pi) interval */
// better, but obfuscatory...
// deltaPhase -= M_TWOPI * floor(deltaPhase / M_TWOPI + .5);
while (deltaPhase >= M_PI)
deltaPhase -= M_TWOPI;
while (deltaPhase < -M_PI)
deltaPhase += M_TWOPI;
(EDIT:) Now the bit I don't get:
// Get deviation from bin frequency from the +/- Pi interval
// Compute the k-th partials' true frequency
// Start with bin's ideal frequency
double bin0Freq = (double)sampleRate / (double)fftFrameSize;
bins[k].idealFreq = (double)k * bin0Freq;
// Add deltaFreq
double sampleTime = 1. / (double)sampleRate;
double samplesInStep = (double)fftFrameSize / (double)osamp;
double stepTime = sampleTime * samplesInStep;
double deltaTime = stepTime;
// Definition of frequency is rate of change of phase, i.e. f = dϕ/dt
// double deltaPhaseUnit = deltaPhase / M_TWOPI; // range [-.5, .5)
double freqAdjust = (1. / M_TWOPI) * deltaPhase / deltaTime;
// Actual freq <-- WHY ???
bins[k].freq = bins[k].idealFreq + freqAdjust;
}
}
I just can't see it clearly, even though it seems to be staring in the face. Could someone please explain this process from scratch, step by step?
回答1:
The basic principle is very simple. If a given component exactly matches a bin frequency then its phase will not change from one FT to the next. However if the frequency does not correspond exactly with the bin frequency then there will be a phase change between successive FTs. The frequency delta is just:
delta_freq = delta_phase / delta_time
and the refined estimate of the frequency of the component will then be:
freq_est = bin_freq + delta_freq
回答2:
I have implemented this algorithm for Performous myself. When you take another FFT at a time offset, you expect the phase to change according to the offset, i.e. two FFTs taken 256 samples apart should have a phase difference of 256 samples for all frequencies present in the signal (this assumes that the signals themselves are steady, which is a good assumption for short periods like 256 samples).
Now, the actual phase values you get from FFT are not in samples but in phase angle, so they will be different depending on the frequency. In the following code the phaseStep value is the conversion factor needed per bin, i.e. for frequency corresponding to bin x, the phase shift will be x * phaseStep. For bin center frequencies x would be an integer (the bin number) but for actual detected frequencies it may be any real number.
const double freqPerBin = SAMPLE_RATE / FFT_N;
const double phaseStep = 2.0 * M_PI * FFT_STEP / FFT_N;
The correction works by assuming that the signal in a bin has the bin center frequency and then calculating the expected phase shift for that. This expected shift is substracted from the actual shift, leaving the error. A remainder (modulo 2 pi) is taken (-pi to pi range) and the final frequency is calculated with bin center + correction.
// process phase difference
double delta = phase - m_fftLastPhase[k];
m_fftLastPhase[k] = phase;
delta -= k * phaseStep; // subtract expected phase difference
delta = remainder(delta, 2.0 * M_PI); // map delta phase into +/- M_PI interval
delta /= phaseStep; // calculate diff from bin center frequency
double freq = (k + delta) * freqPerBin; // calculate the true frequency
Notice that many adjacent bins often end up corrected to the same frequency because the delta correction can be up to 0.5 * FFT_N / FFT_STEP bins either way so the smaller FFT_STEP you use, the further away will corrections be possible (but this increases the processing power needed as well as imprecision due to inaccuracies).
I hope this helps :)
回答3:
This is the frequency estimation technique used by phase vocoder methods.
If you look at a single point on a (fixed frequency and fixed amplitude) sine wave in time, the phase will advance with time by an amount proportional to the frequency. Or you can do the converse: if you measure how much the phase of a sinusoid changes over any unit of time, you can calculate the frequency of that sinusoid.
A phase vocoder uses two FFTs to estimate phase with reference to two FFT windows, and the offset of the two FFTs is the distance between the 2 phase measurements in time. From thence, you have your frequency estimate for that FFT bin (an FFT bin being roughly a filter to isolate a sinusoidal component or other sufficiently narrowband signal that fits within that bin).
For this method to work, the spectrum near the FFT bin in use has to be fairly stationary, e.g. not changing in frequency, etc. That's the assumption a phase vocoder requires.
回答4:
Finally I have figured this out; really I had to derive it from scratch. I knew there would be some simple way to derive it, my (usual) mistake was to attempt to follow other people's logic rather than use my own common sense.
This puzzle takes two keys to unlock it.
...
for (int k = 0; k <= fftFrameSize/2; k++)
{
// compute magnitude and phase
bins[k].mag = 2.*sqrt(fftBins[k].real*fftBins[k].real + fftBins[k].imag*fftBins[k].imag);
bins[k].phase = atan2(fftBins[k].imag, fftBins[k].real);
// Compute phase difference Δϕ fo bin[k]
double deltaPhase;
{
double measuredPhaseDiff = bins[k].phase - gLastPhase[k];
gLastPhase[k] = bins[k].phase;
// Subtract expected phase difference <-- FIRST KEY
// Think of a single wave in a 1024 float frame, with osamp = 4
// if the first sample catches it at phase = 0, the next will
// catch it at pi/2 ie 1/4 * 2pi
double binPhaseExpectedDiscrepancy = M_TWOPI * (double)k / (double)osamp;
deltaPhase = measuredPhaseDiff - binPhaseExpectedDiscrepancy;
// Wrap delta phase into [-Pi, Pi) interval
deltaPhase -= M_TWOPI * floor(deltaPhase / M_TWOPI + .5);
}
// say sampleRate = 40K samps/sec, fftFrameSize = 1024 samps in FFT giving bin[0] thru bin[512]
// then bin[1] holds one whole wave in the frame, ie 44 waves in 1s ie 44Hz ie sampleRate / fftFrameSize
double bin0Freq = (double)sampleRate / (double)fftFrameSize;
bins[k].idealFreq = (double)k * bin0Freq;
// Consider Δϕ for bin[k] between hops.
// write as 2π / m.
// so after m hops, Δϕ = 2π, ie 1 extra cycle has occurred <-- SECOND KEY
double m = M_TWOPI / deltaPhase;
// so, m hops should have bin[k].idealFreq * t_mHops cycles. plus this extra 1.
//
// bin[k].idealFreq * t_mHops + 1 cycles in t_mHops seconds
// => bins[k].actualFreq = bin[k].idealFreq + 1 / t_mHops
double tFrame = fftFrameSize / sampleRate;
double tHop = tFrame / osamp;
double t_mHops = m * tHop;
bins[k].freq = bins[k].idealFreq + 1. / t_mHops;
}
回答5:
Maybe this will help. Think of the FFT bins as specifying little clocks or rotors, each spinning at the frequency of the bin. For a stable signal, the (theoretical) next position of the rotor can be predicted using the math in the bit you don't get.
Against this "should be" (ideal) position, you can compute several useful things: (1) the difference with the phase in a bin of an adjacent frame, which is used by a phase vocoder to better estimate of the bin frequency, or (2) more generally phase deviation, which is a positive indicator of a note onset or some other event in the audio.
回答6:
Signal frequencies that fall exactly on a bin frequency advance bin phase by integer multiples of 2π. Since the bin phases that correspond to the bin frequencies are multiples of 2π due to the periodic nature of the FFT there is no phase change in this case. The article you mention also explains this.