I am working on a small example application for my fourth year project (dealing with Functional Reactive Programming). The idea is to create a simple program that can play a .wav file and then shows a 'bouncing' animation of the current volume of the playing song (like in audio recording software). I'm building this in Scala so have mainly been looking at Java libraries and existing solutions.
Currently, I have managed to play a .wav file easily but I can't seem to achieve the second goal. Basically is there a way I can decode a .wav file so I can have someway of accessing the 'volume' at any given time? By volume I think I means its amplitude but I may be wrong about this - Higher Physics was a while ago....
Clearly, I don't know much about this at all so it would be great if someone could point me in the right direction!
In digital audio processing you typically refer to the momentary peak amplitude of the signal (this is also called PPM -- peak programme metering). Depending on how accurate you want to be or if you wish to model some standardised metering or not, you could either
The other measuring mode is RMS which is calculated by integrating over a certain time window (add the squared sample values, divide by the window length, and take the square-root, thus root-mean-square RMS). This gives a better idea of the 'energy' of the signal, moving smoother than peak measurements, but not capturing the maximum values observed. This mode is sometimes called VU meter as well. You can approximate this with a sort of lagging (lowpass) filter, e.g.
y[i] = y[i-1]*a + |x[i]|*(a-1)
, for some value0 < a < 1
You typically display the values logarithmically, i.e. in decibels, as this corresponds better with our perception of signal strength and also for most signals produces a more regular coverage of your screen space.
Three projects I'm involved with may help you:
In a wav file, the data at a given point in the stream IS the volume (shifted by half of the dynamic range). In other words, if you know what type of wav file (for example 8 bit, mono) each byte represents a single sample. If you know the sample rate (say 44100 HZ) then multiply the time by 44100 and that is the byte you want to look at.
The value of the byte is the volume (distance from the middle.. 0 and 255 are the peaks, 127 is zero). This is assuming that the encoding is not mu-law encoding. I found some good info on how to tell the difference, or better yet, convert between these formats here:
http://www.gnu.org/software/octave/doc/interpreter/Audio-Processing.html
You may want to average these samples though over a window of some fixed number of samples.