可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I want to do the FFT of an audio signal in real time, meaning while the person is speaking in the microphone. I will fetch the data (I do this with portaudio, if it would be easier with wavein I would be happy to use that - if you can tell me how). Next I am using the FFTW library - I know how to perform 1D, 2D (real&complex) FFT, but I am not so sure how to do this, since I would have to do a 3D FFT to get frequency, amplitude (this would determine the color gradient) and time. Or is it just a 2D FFT, and I get amplitude and frequency?

回答1:

If you need amplitude, frequency and time in one graph, then the transform is known as a Time-Frequency decomposition. The most popular one is called the Short Time Fourier Transform. It works as follows:
1. Take a small portion of the signal (say 1 second)
2. Window it with a small window (say 5 ms)
3. Compute the 1D fourier transform of the windowed signal.
4. Move the window by a small amount (2.5 ms)
5. Repeat above steps until end of signal.
6. All of this data is entered into a matrix that is then used to create the kind of 3D representation of the signal that shows its decomposition along frequency, amplitude and time.

The length of the window will decide the resolution you are able to obtain in frequency and time domains. Check here for more details on STFT and search for "Robi Polikar"'s tutorials on wavelet transforms for a layman's introduction to the above.

Edit 1:
You take a windowing function (there are innumerable functions out there - here is a list. Most intuitive is a rectangular window but the most commonly used are the Hamming/Hanning window functions. You can follow the steps below if you have a paper-pencil in hand and draw it along.

Assume that the signal that you have obtained is 1 sec long and is named x[n]. The windowing function is 5 msec long and is named w[n]. Place the window at the start of the signal (so the end of the window coincides with the 5ms point of the signal) and multiply the x[n] and w[n] like so:
y[n] = x[n] * w[n] - point by point multiplication of the signals.
Take an FFT of y[n].

Then you shift the window by a small amount (say 2.5 msec). So now the window stretches from 2.5ms to 7.5 ms of the signal x[n]. Repeat the multiplication and FFT generation steps. In other words, you have an overlap of 2.5 msec. You will see that changing the length of the window and the overlap gives you different resolutions on the time and Frequency axis.

Once you do this, you need to feed all the data into a matrix and then have it displayed. The overlap is for minimising the errors that might arise at boundaries and also to get more consistent measurements over such short time frames.

P.S: If you had understood STFT and other time-frequency decompositions of a signal, then you should have had no problems with steps 2 and 4. That you have not understood the above mentioned steps makes me feel like you should revisit time-frequency decompositions also.

回答2:

I use a Sliding DFT, which is many times faster than an FFT in the case where you need to do a fourier transform each time a sample arrives in the input buffer.

It's based on the fact that once you have performed a fourier transform for the last N samples, and a new sample arrives, you can "undo" the effect of the oldest sample, and apply the effect of the latest sample, in a single pass through the fourier data! This means that the sliding DFT performance is O(n) compared with O(Log2(n) times n) for the FFT. Also, there's no restriction to powers of two for the buffer size to maintain performance.

The complete test program below compares the sliding DFT with fftw. In my production code I've optimized the below code to unreadibility, to make it three times faster.

    #include <complex>
#include <iostream>
#include <time.h>
#include <math_defines.h>
#include <float.h>

#define DO_FFTW // libfftw
#define DO_SDFT

#if defined(DO_FFTW)
#pragma comment( lib, "d:\\projects\\common\\fftw\\libfftw3-3.lib" )

namespace fftw {
#include <fftw/fftw3.h>
}

fftw::fftw_plan plan_fwd;
fftw::fftw_plan plan_inv;

#endif

typedef std::complex<double> complex;

// Buffer size, make it a power of two if you want to improve fftw
const int N = 750;


// input signal
complex in[N];

// frequencies of input signal after ft
// Size increased by one because the optimized sdft code writes data to freqs[N]
complex freqs[N+1];

// output signal after inverse ft of freqs
complex out1[N];
complex out2[N];

// forward coeffs -2 PI e^iw -- normalized (divided by N)
complex coeffs[N];
// inverse coeffs 2 PI e^iw
complex icoeffs[N];

// global index for input and output signals
int idx;


// these are just there to optimize (get rid of index lookups in sdft)
complex oldest_data, newest_data;

//initilaize e-to-the-i-thetas for theta = 0..2PI in increments of 1/N
void init_coeffs()
{
    for (int i = 0; i < N; ++i) {
        double a = -2.0 * PI * i  / double(N);
        coeffs[i] = complex(cos(a)/* / N */, sin(a) /* / N */);
    }
    for (int i = 0; i < N; ++i) {
        double a = 2.0 * PI * i  / double(N);
        icoeffs[i] = complex(cos(a),sin(a));
    }
}


// initialize all data buffers
void init()
{
    // clear data
    for (int i = 0; i < N; ++i)
        in[i] = 0;
    // seed rand()
    srand(857);
    init_coeffs();
    oldest_data = newest_data = 0.0;
    idx = 0;
}

// simulating adding data to circular buffer
void add_data()
{
    oldest_data = in[idx];
    newest_data = in[idx] = complex(rand() / double(N));

}


// sliding dft
void sdft()
{
    complex delta = newest_data - oldest_data;
    int ci = 0;
    for (int i = 0; i < N; ++i) {
        freqs[i] += delta * coeffs[ci];
        if ((ci += idx) >= N)
            ci -= N;
    }
}

// sliding inverse dft
void isdft()
{
    complex delta = newest_data - oldest_data;
    int ci = 0;
    for (int i = 0; i < N; ++i) {
        freqs[i] += delta * icoeffs[ci];
        if ((ci += idx) >= N)
            ci -= N;
    }
}

// "textbook" slow dft, nested loops, O(N*N)
void ft()
{
    for (int i = 0; i < N; ++i) {
        freqs[i] = 0.0;
        for (int j = 0; j < N; ++j) {
            double a = -2.0 * PI * i * j / double(N);
            freqs[i] += in[j] * complex(cos(a),sin(a));
        }
    }
}

double mag(complex& c)
{
    return sqrt(c.real() * c.real() + c.imag() * c.imag());
}

void powr_spectrum(double *powr)
{
    for (int i = 0; i < N/2; ++i) {
        powr[i] = mag(freqs[i]);
    }

}

int main(int argc, char *argv[])
{
    const int NSAMPS = N*10;
    clock_t start, finish;

#if defined(DO_SDFT)
// ------------------------------ SDFT ---------------------------------------------
    init();

    start = clock();
    for (int i = 0; i < NSAMPS; ++i) {

        add_data();

        sdft();
        // Mess about with freqs[] here
        //isdft();

        if (++idx == N) idx = 0; // bump global index

        if ((i % 1000) == 0)
            std::cerr << i << " iters..." << '\r';
    }
    finish = clock();

    std::cout << "SDFT: " << NSAMPS / ((finish-start) / (double)CLOCKS_PER_SEC) << " fts per second." << std::endl;

    double powr1[N/2];
    powr_spectrum(powr1);
#endif

#if defined(DO_FFTW)

// ------------------------------ FFTW ---------------------------------------------
    plan_fwd = fftw::fftw_plan_dft_1d(N, (fftw::fftw_complex *)in, (fftw::fftw_complex *)freqs, FFTW_FORWARD, FFTW_MEASURE);
    plan_inv = fftw::fftw_plan_dft_1d(N, (fftw::fftw_complex *)freqs, (fftw::fftw_complex *)out2, FFTW_BACKWARD, FFTW_MEASURE);

    init();

    start = clock();
    for (int i = 0; i < NSAMPS; ++i) {

        add_data();

        fftw::fftw_execute(plan_fwd);
        // mess about with freqs here
        //fftw::fftw_execute(plan_inv);

        if (++idx == N) idx = 0; // bump global index

        if ((i % 1000) == 0)
            std::cerr << i << " iters..." << '\r';
    }
    // normalize fftw's output 
    for (int j = 0; j < N; ++j)
        out2[j] /= N;

    finish = clock();

    std::cout << "FFTW: " << NSAMPS / ((finish-start) / (double)CLOCKS_PER_SEC) << " fts per second." << std::endl;
    fftw::fftw_destroy_plan(plan_fwd);
    fftw::fftw_destroy_plan(plan_inv);

    double powr2[N/2];
    powr_spectrum(powr2);
#endif
#if defined(DO_SDFT) && defined(DO_FFTW)
// ------------------------------      ---------------------------------------------
    const double MAX_PERMISSIBLE_DIFF = 1e-11; // DBL_EPSILON;
    double diff;
    // check my ft gives same power spectrum as FFTW
    for (int i = 0; i < N/2; ++i)
        if ( (diff = abs(powr1[i] - powr2[i])) > MAX_PERMISSIBLE_DIFF)
            printf("Values differ by more than %g at index %d.  Diff = %g\n", MAX_PERMISSIBLE_DIFF, i, diff);

#endif
    return 0;
}

回答3:

You can create a realtime FFT by choosing a short time-span and analysing (FFT'ing) just that time-span. You can probably get away with just selecting non-overlapping timespans of say 100-500 milliseconds; the analytically purer way to do this would be using a sliding-window (again of e.g. 100-500 ms), but that is often unnecessary and you can show nice graphics with the non-overlapping timespans without much processing power.

回答4:

Real-time FFT means completely different from what you just described. It means that for given N and X[N] your algorithm gives Fx[i] while incrementing value i. Meaning, proceeding value does not compute until current value computation completed. This is completely different from what you just described.

Hardware usually uses FFT with around 1k-16k points. Fixed N, not real-time computation. Moving window FFT as described with previous answers.