Peak frequencies from .wav file

2019-06-14 07:15发布

I have a .wav file which recorded by me when I was playing guitar notes. Then I used below program to read my .wav file data. I used Naudio library.

AudioFileReader readertest = new AudioFileReader(@"E:\song\music.wav");
int bytesnumber = (int)readertest.Length;
var buffer = new float[bytesnumber];
readertest.Read(buffer, 0, bytesnumber);

for (int i = 0; i < buffer.Length; i++)
{
    Console.Write(buffer[i] + "\n");
}

it outputs like below.(part of output).

       0.00567627
       0.007659912
       0.005187988
       0.005706787
       0.005218506
       0.003051758
       0.004669189
       0.0007324219
       0.004180908
      -0.001586914
       0.00402832
      -0.003479004
       0.003143311
      -0.004577637
       0.001037598
      -0.005432129
      -0.001800537
      -0.005157471

I'm confused about what this output data contains. I want to take peak frequencies where the notes are played. How can I convert the above data to frequencies?

1条回答
Summer. ? 凉城
2楼-- · 2019-06-14 07:51

The data you are seeing is the raw samples in floating point format. This is the waveform data that represents the audio signal. When sent to the playback device it produces the sound.

To get a frequency map you will need to pass blocks of sample data through an FFT function to get the base analysis, returned as a pair of values (X and Y) for each frequency bin. From this you can calculate the power level for the frequencies in the signal. The power function is basically 10 * Log10(Sqrt(X*X + Y*Y)) for each element in the array. (And you probably never thought you'd use Pythagoras Theorem outside of Trig class!)

The resultant array will have the same number of items in it as you passed to the FFT. Each value represents the frequency n * Fs / N where n is the offset into the array, N is the array length and Fs is that sample rate. Take the bottom half of the samples and work with those. Anything in the top half of the array will be of no use to you, so make sure your sample rate is high enough that the frequencies you are interested in are less than half the sampling rate.

The size of the buffer you pass to the FFT is going to be a trade-off between frequency resolution, response time and allowance for the windowing function. Too short a buffer will get nasty spectral bleed and your frequency resolution goes out the window, too long and it can be late recognizing the tones. And of course it needs to be a power of two for the FFT, so picking the right value is probably going to take some work. Test the various options and see which one fits best for you.

Mark has written some code for FFT visualization in the NAudioWpfDemo sample application. Have a look at the SpectrumAnalyzer custom control which contains the power function (in SpectrumAnalyzer.GetYPosLong). Also look at the SampleAggregator class which contains the sample to FFT aggregation code.

查看更多
登录 后发表回答