可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm trying to analyse the audio quality of a file by detecting the highest frequency present (compressed audio will generally be filtered to something less than 20KHz).

I'm reading WAV file data using a class from the soundstretch library which returns PCM samples as floats, then performing FFT on those samples with the fftw3 library. Then for each frequency (rounded to the nearest KHz), I am totalling up the amplitude for that frequency.

So for a low quality file that doesn't contain frequencies above 16KHz, I would expect there to be none or very little amplitude above 16KHz, however I'm not getting the results I would expect. Below is my code:

#include <iostream>
#include <math.h>

#include <fftw3.h>
#include <soundtouch/SoundTouch.h>
#include "include/WavFile.h"

using namespace std;
using namespace soundtouch;

#define BUFF_SIZE           6720
#define MAX_FREQ            22//KHz

static float freqMagnitude[MAX_FREQ];

static void calculateFrequencies(fftw_complex *data, size_t len, int Fs) {
    for (int i = 0; i < len; i++) {
        int re, im;
        float freq, magnitude;
        int index;

        re = data[i][0];
        im = data[i][1];

        magnitude = sqrt(re * re + im * im);
        freq = i * Fs / len;

        index = freq / 1000;//round(freq);
        if (index <= MAX_FREQ) {
            freqMagnitude[index] += magnitude;
        }
    }
}

int main(int argc, char *argv[]) {
    if (argc < 2) {
        cout << "Incorrect args" << endl;
        return -1;
    }

    SAMPLETYPE sampleBuffer[BUFF_SIZE];
    WavInFile inFile(argv[1]);

    fftw_complex *in, *out;
    fftw_plan p;

    in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * BUFF_SIZE);
    out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * BUFF_SIZE);

    p = fftw_plan_dft_1d(BUFF_SIZE, in, out, FFTW_FORWARD, FFTW_ESTIMATE);

    while (inFile.eof() == 0) {
        size_t samplesRead = inFile.read(sampleBuffer, BUFF_SIZE);

        for (int i = 0; i < BUFF_SIZE; i++) {
            in[i][0] = (double) sampleBuffer[i];
        }

        fftw_execute(p); /* repeat as needed */

        calculateFrequencies(out, samplesRead, inFile.getSampleRate());
    }

    for (int i = 0; i < MAX_FREQ; i += 2) {
        cout << i << "KHz magnitude: " << freqMagnitude[i] << std::endl;
    }

    fftw_destroy_plan(p);
    fftw_free(in);
    fftw_free(out);
 }

Can compile with: - (you'll need the soundtouch library and fftw3 library)

g++ -g -Wall MP3.cpp include/WavFile.cpp -lfftw3 -lm -lsoundtouch -I/usr/local/include -L/usr/local/lib

And here is the spectral analysis of the file I am testing on:

As you can see it's clipped at 16KHz, however my results are as follows:

0KHz magnitude: 4.61044e+07
2KHz magnitude: 5.26959e+06
4KHz magnitude: 4.68766e+06
6KHz magnitude: 4.12703e+06
8KHz magnitude: 12239.6
10KHz magnitude: 456
12KHz magnitude: 3
14KHz magnitude: 650468
16KHz magnitude: 1.83266e+06
18KHz magnitude: 1.40232e+06
20KHz magnitude: 1.1477e+06

I would expect there to be no amplitude over 16KHz, am I doing this right? Is my calculation for frequency correct? (I robbed it off another stackoverflow answer) Could it be something to do with there being 2 channels and I'm not separating the channels?

Cheers for any help guys.

回答1:

You are likely measuring the interleave difference between two stereo channels, which can include high frequencies due to unequal mix and pan. Try again with the channels separated or mixed down to mono, and use a smooth window function to reduce FFT aperture edge artifacts, which can also introduce a small amount of high frequency noise due to your rectangular window.

回答2:

An FFT foundamental requirement is the equally time spacing of samples and their congruence.
In your case a stereo signal supply to the FFT algorithm double the number of samples uncorrelated between themself. What is mathematically seen is the natural phase difference between the two cannels, but, more important, two samples that, because unrelated, can have such a big difference to wrongly represent a square wave (in the time domain it would be represented by an extremely high signal slew rate).
As a solution you have to separate the two channels and perform FFT on one single series of samples, or two different FFT.
I don't think that there could be any aliasing problem because this is normally related to the sampling process and performed using analog filter having bandpass frequency < 1/2 the sampling frequence (Nyquist or antialias filter). If this filtering is missed there are almost no way to remove ghosts (alias spectrum) after.

回答3:

I speak as someone with very slight real-world experience and book learning over a decade ago so this answer might be evidence of a little knowledge being a dangerous thing but I think the problem you're seeing is just aliasing.

Imagine a perfect square wave. You've never heard a perfect square wave because it would require a sound source instantly to transition from one position to another, while still pushing air particles about.

You also can't describe a square wave with a finite number of harmonics. However, you can trivially describe a square wave with any frequency of PCM audio. Therefore any source PCM audio can appear to contain an infinite number of harmonics.

What you can probably do is just sit atop Nyquist and say that if the input audio is N Mhz then the highest-frequency part that can be actual signal is at N/2 Mhz; therefore you can resample the input wave down to twice the first rate less than or equal to N/2 Mhz that shows significant signal without losing meaningful content.