Visualizing volume of PCM samples

2019-04-15 15:17发布

问题:

I have several chunks of PCM audio (G.711) in my C++ application. I would like to visualize the different audio volume in each of these chunks.

My first attempt was to calculate the average of the sample values for each chunk and use that as an a volume indicator, but this doesn't work well. I do get 0 for chunks with silence and differing values for chunks with audio, but the values only differ slighly and don't seem to resemble the actual volume.

What would be a better algorithem calculate the volume ?

I hear G.711 audio is logarithmic PCM. How should I take that into account ?

回答1:

Note, I haven't worked with G.711 PCM audio myself, but I presume that you are performing the correct conversion from the encoded amplitude to an actual amplitude before processing the values.

You'd expect the average value of most samples to be approximately zero as sound waveforms oscillate either side of zero.

A crude volume calculation would be rms (root mean square), i.e. taking a rolling average of the square of the samples and take the square root of that average. This will give you a postive quantity when there is some sound; the quantity is related to the power represented in the waveform.

For something better related to human perception of volume you may want to investigate the sort of techniques used in Replay Gain.



回答2:

If you're feeling ambitious, you can download G.711 from the ITU-web site, and spend the next few weeks (or maybe more) implementing it.

If you're lazier (or more sensible) than that, you can download G.191 instead -- it includes source code to compress and decompress G.711 encoded data.

Once you've decoded it, visualizing the volume should be a whole lot easier.