Analyzing wav and drawing a graph

2019-03-11 08:37发布

问题:

I'm trying to print out an wave from a wav file, but I'm kinda lost on how the length I should take for an sample.

this is what I would love to archieve (without the colors):

so for reading in my data I use the following code:

// first we need to read our wav file, so we can get our info:
byte[] wav = File.ReadAllBytes(filename);

// then we are going to get our file's info
info.NumChannnels = wav[22];
info.SampleRate = bytesToInt(wav[24], wav[25]);

// nr of samples is the length - the 44 bytes that where needed for the offset
int samples = (wav.Length - 44) / 2;

// if there are 2 channels, we need to devide the nr of sample in 2
if (info.NumChannnels == 2) samples /= 2;

// create the array
leftChannel = new List<float>();
if (info.NumChannnels == 2) rightChannel = new List<float>();
else rightChannel = null;

int pos = 44; // start of data chunk
for (int i = 0; i < samples; i++) {
    leftChannel.Add(bytesToFloat(wav[pos], wav[pos + 1]));
    pos += 2;
    if (info.NumChannnels == 2) {
        rightChannel.Add(bytesToFloat(wav[pos], wav[pos + 1]));
        pos += 2;
    }
}

BytesToFloat = Converts 2 bytes to an float between -1 and 1

So now I have 2 lists of data, but now how do I how many numbers I should take for creating 1 line?

what confuses me the most: when you play a song, you can see in most music players the following data, this is in my eyes the representation of 1 sample.

but how do you know the value of each of those bars, and how many bars there are in a sample

回答1:

Your question is about two different visualisations of audio. To draw the waveform, the code you posted is close to being ready to draw from, but you are adding a single entry per sample to your list. Since audio is often 44100 samples per second, the waveform for a 3 minute song would require almost 8 million pixels across. So what you do is batch them up. For every say 4410 pixels (i.e. 100ms), find the one with the highest and lowest values, and then use that to draw the line. in fact, you can usually get away with just finding the max Abs value, and drawing a symetrical waveform.

Here is some code to draw a basic WaveForm of an audio file in WPF, using NAudio for easier access to the sample values (it can do WAV or MP3 files). I haven't included any splitting out of left and right channels, but that should be fairly easy to add:

var window = new Window();
var canvas = new Canvas();
using(var reader = new AudioFileReader(file))
{
    var samples = reader.Length / (reader.WaveFormat.Channels * reader.WaveFormat.BitsPerSample / 8);
    var f = 0.0f;
    var max = 0.0f;
    // waveform will be a maximum of 4000 pixels wide:
    var batch = (int)Math.Max(40, samples / 4000);
    var mid = 100;
    var yScale = 100;
    float[] buffer = new float[batch];
    int read;
    var xPos = 0;
    while((read = reader.Read(buffer,0,batch)) == batch)
    {
        for(int n = 0; n < read; n++)
        {
            max = Math.Max(Math.Abs(buffer[n]), max);
        }
        var line = new Line();
        line.X1 = xPos;
        line.X2 = xPos;
        line.Y1 = mid + (max * yScale); 
        line.Y2 = mid - (max * yScale);
        line.StrokeThickness = 1;
        line.Stroke = Brushes.DarkGray;
        canvas.Children.Add(line);
        max = 0;    
        xPos++;
    }
    canvas.Width = xPos;
    canvas.Height = mid * 2;
}
window.Height = 260;
var scrollViewer = new ScrollViewer();
scrollViewer.Content = canvas;
scrollViewer.HorizontalScrollBarVisibility = ScrollBarVisibility.Auto;
window.Content = scrollViewer;
window.ShowDialog();

The second visualisation is sometimes called a spectogram or a spectrum analyser. It does not represent 1 sample, but represents the frequencies present in a block of samples. To get at this information you need to pass your samples through a Fast Fourier Transform (FFT). Usually you pass through blocks of 1024 samples (it should be a power of 2). Unforunately FFTs can be tricky to work with if you are new to DSP, as there are several things you need to learn how to do:

  • apply a windowing function
  • get your audio into the right input format (many FFTs expect input as complex numbers)
  • work out which bin numbers correspond to which frequency,
  • find the magnitude of each bin and convert it to a decibel scale.

You should be able to find further information on each of those topics here on StackOverflow. I've written a bit about how you can use FFT in C# in this article.



标签: c# graph wav