I was looking at this Web Audio API demo, part of this nice book
If you look at the demo, the fft peaks fall smoothly. I'm trying to do same with Processing in Java mode using the minim library. I've looked at how this is done with the web audio api in the doFFTAnalysis() method and tried to replicate this with minim. I also tried to port how abs() works with the complex type:
/ 26.2.7/3 abs(__z): Returns the magnitude of __z.
00565 template<typename _Tp>
00566 inline _Tp
00567 __complex_abs(const complex<_Tp>& __z)
00568 {
00569 _Tp __x = __z.real();
00570 _Tp __y = __z.imag();
00571 const _Tp __s = std::max(abs(__x), abs(__y));
00572 if (__s == _Tp()) // well ...
00573 return __s;
00574 __x /= __s;
00575 __y /= __s;
00576 return __s * sqrt(__x * __x + __y * __y);
00577 }
00578
I'm currently doing a quick prototype using Processing(a java framework/library). My code looks like this:
import ddf.minim.*;
import ddf.minim.analysis.*;
private int blockSize = 512;
private Minim minim;
private AudioInput in;
private FFT mfft;
private float[] time = new float[blockSize];//time domain
private float[] real = new float[blockSize];
private float[] imag = new float[blockSize];
private float[] freq = new float[blockSize];//smoothed freq. domain
public void setup() {
minim = new Minim(this);
in = minim.getLineIn(Minim.STEREO, blockSize);
mfft = new FFT( in.bufferSize(), in.sampleRate() );
}
public void draw() {
background(255);
for (int i = 0; i < blockSize; i++) time[i] = in.left.get(i);
mfft.forward( time);
real = mfft.getSpectrumReal();
imag = mfft.getSpectrumImaginary();
final float magnitudeScale = 1.0 / mfft.specSize();
final float k = (float)mouseX/width;
for (int i = 0; i < blockSize; i++)
{
float creal = real[i];
float cimag = imag[i];
float s = Math.max(creal,cimag);
creal /= s;
cimag /= s;
float absComplex = (float)(s * Math.sqrt(creal*creal + cimag*cimag));
float scalarMagnitude = absComplex * magnitudeScale;
freq[i] = (k * mfft.getBand(i) + (1 - k) * scalarMagnitude);
line( i, height, i, height - freq[i]*8 );
}
fill(0);
text("smoothing: " + k,10,10);
}
I'm not getting errors, which is good, but I'm not getting the expected behaviour which is bad. I expected the peaks to fall slower when smoothing(k) is close 1, but as far as I can tell my code only scales the bands.
Unfortunately math and sound isn't my strong point, so I'm stabbing in the dark. How can I replicate the nice visualisation from the Web Audio API demo ?
I would be tempted to say this can be language agnostic, but using javascript for example wouldn't apply :). However, I'm happy to try any other java library that does FFT analysis.
UPDATE
I've got a simple solution for smoothing (continuously diminish values of each previous fft band if the current fft band is not higher:
import ddf.minim.analysis.*;
import ddf.minim.*;
Minim minim;
AudioInput in;
FFT fft;
float smoothing = 0;
float[] fftReal;
float[] fftImag;
float[] fftSmooth;
int specSize;
void setup(){
size(640, 360, P3D);
minim = new Minim(this);
in = minim.getLineIn(Minim.STEREO, 512);
fft = new FFT(in.bufferSize(), in.sampleRate());
specSize = fft.specSize();
fftSmooth = new float[specSize];
fftReal = new float[specSize];
colorMode(HSB,specSize,100,100);
}
void draw(){
background(0);
stroke(255);
fft.forward( in.left);
fftReal = fft.getSpectrumReal();
fftImag = fft.getSpectrumImaginary();
for(int i = 0; i < specSize; i++)
{
float band = fft.getBand(i);
fftSmooth[i] *= smoothing;
if(fftSmooth[i] < band) fftSmooth[i] = band;
stroke(i,100,50);
line( i, height, i, height - fftSmooth[i]*8 );
stroke(i,100,100);
line( i, height, i, height - band*8 );
}
text("smoothing: " + (int)(smoothing*100),10,10);
}
void keyPressed(){
float inc = 0.01;
if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
}
The faded graph is the smoothed one and the fully saturated one is the live one.
I am however still missing something, in comparison to the Web Audio API demo:
I think the Web Audio API might take into account that the medium and higher frequencies will need to be scaled to be closer to what we perceive, but I'm not sure how to tackle that.
I was trying to read more on how the RealtimeAnalyser class does this for the WebAudioAPI, but it seems FFTFrame class's doFFT
method might do the logarithmic scaling. I haven't figured out how doFFT works yet.
How can I scale a raw FFT graph with a logarithmic scale to account for perception ? My goal is to do a decent looking visualisation and my guess is i will need to:
- smooth values, otherwise elements will animate to fast/twitchy
- scale the FFT bins/bands to get better data for medium/high frequencies
- map process FFT values to visual elements (find the maximum values/bounds)
Any hints on how I can achieve this ?
UPDATE 2
I'm guessing this part does the smoothing and scaling I'm after in the Web Audio API: // Normalize so than an input sine wave at 0dBfs registers as 0dBfs (undo FFT scaling factor). const double magnitudeScale = 1.0 / DefaultFFTSize;
// A value of 0 does no averaging with the previous result. Larger values produce slower, but smoother changes.
double k = m_smoothingTimeConstant;
k = max(0.0, k);
k = min(1.0, k);
// Convert the analysis data from complex to magnitude and average with the previous result.
float* destination = magnitudeBuffer().data();
size_t n = magnitudeBuffer().size();
for (size_t i = 0; i < n; ++i) {
Complex c(realP[i], imagP[i]);
double scalarMagnitude = abs(c) * magnitudeScale;
destination[i] = float(k * destination[i] + (1 - k) * scalarMagnitude);
}
It seems the scaling is done by taking the absolute of the complex value. This post points in the same direction. I've tried using the abs of the complex number using Minim and using various window functions but it still doesn't look like what I'm aiming for(the Web Audio API demo):
import ddf.minim.analysis.*;
import ddf.minim.*;
Minim minim;
AudioInput in;
FFT fft;
float smoothing = 0;
float[] fftReal;
float[] fftImag;
float[] fftSmooth;
int specSize;
WindowFunction[] window = {FFT.NONE,FFT.HAMMING,FFT.HANN,FFT.COSINE,FFT.TRIANGULAR,FFT.BARTLETT,FFT.BARTLETTHANN,FFT.LANCZOS,FFT.BLACKMAN,FFT.GAUSS};
String[] wlabel = {"NONE","HAMMING","HANN","COSINE","TRIANGULAR","BARTLETT","BARTLETTHANN","LANCZOS","BLACKMAN","GAUSS"};
int windex = 0;
void setup(){
size(640, 360, P3D);
minim = new Minim(this);
in = minim.getLineIn(Minim.STEREO, 512);
fft = new FFT(in.bufferSize(), in.sampleRate());
fft.window(window[windex]);
specSize = fft.specSize();
fftSmooth = new float[specSize];
fftReal = new float[specSize];
colorMode(HSB,specSize,100,100);
}
void draw(){
background(0);
stroke(255);
fft.forward( in.mix);
fftReal = fft.getSpectrumReal();
fftImag = fft.getSpectrumImaginary();
for(int i = 0; i < specSize; i++)
{
float band = fft.getBand(i);
//Sw = abs(Sw(1:(1+N/2))); %# abs is sqrt(real^2 + imag^2)
float abs = sqrt(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);
fftSmooth[i] *= smoothing;
if(fftSmooth[i] < abs) fftSmooth[i] = abs;
stroke(i,100,50);
line( i, height, i, height - fftSmooth[i]*8 );
stroke(i,100,100);
line( i, height, i, height - band*8 );
}
text("smoothing: " + (int)(smoothing*100)+"\nwindow:"+wlabel[windex],10,10);
}
void keyPressed(){
float inc = 0.01;
if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
if(key == 'W' && windex < window.length-1) windex++;
if(key == 'w' && windex > 0) windex--;
if(key == 'w' || key == 'W') fft.window(window[windex]);
}
I'm not sure I'm using the window functions correctly because I don't notice a huge difference between them. Is the abs of the complex value correct ? How can I get a visualisation closer to my aim ?
UPDATE 3
I've tried to apply @wakjah's helpful tips like so:
import ddf.minim.analysis.*;
import ddf.minim.*;
Minim minim;
AudioInput in;
FFT fft;
float smoothing = 0;
float[] fftReal;
float[] fftImag;
float[] fftSmooth;
float[] fftPrev;
float[] fftCurr;
int specSize;
WindowFunction[] window = {FFT.NONE,FFT.HAMMING,FFT.HANN,FFT.COSINE,FFT.TRIANGULAR,FFT.BARTLETT,FFT.BARTLETTHANN,FFT.LANCZOS,FFT.BLACKMAN,FFT.GAUSS};
String[] wlabel = {"NONE","HAMMING","HANN","COSINE","TRIANGULAR","BARTLETT","BARTLETTHANN","LANCZOS","BLACKMAN","GAUSS"};
int windex = 0;
int scale = 10;
void setup(){
minim = new Minim(this);
in = minim.getLineIn(Minim.STEREO, 512);
fft = new FFT(in.bufferSize(), in.sampleRate());
fft.window(window[windex]);
specSize = fft.specSize();
fftSmooth = new float[specSize];
fftPrev = new float[specSize];
fftCurr = new float[specSize];
size(specSize, specSize/2);
colorMode(HSB,specSize,100,100);
}
void draw(){
background(0);
stroke(255);
fft.forward( in.mix);
fftReal = fft.getSpectrumReal();
fftImag = fft.getSpectrumImaginary();
for(int i = 0; i < specSize; i++)
{
//float band = fft.getBand(i);
//Sw = abs(Sw(1:(1+N/2))); %# abs is sqrt(real^2 + imag^2)
//float abs = sqrt(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);
//fftSmooth[i] *= smoothing;
//if(fftSmooth[i] < abs) fftSmooth[i] = abs;
//x_dB = 10 * log10(real(x) ^ 2 + imag(x) ^ 2);
fftCurr[i] = scale * (float)Math.log10(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);
//Y[k] = alpha * Y_(t-1)[k] + (1 - alpha) * X[k]
fftSmooth[i] = smoothing * fftPrev[i] + ((1 - smoothing) * fftCurr[i]);
fftPrev[i] = fftCurr[i];//
stroke(i,100,100);
line( i, height, i, height - fftSmooth[i]);
}
text("smoothing: " + (int)(smoothing*100)+"\nwindow:"+wlabel[windex]+"\nscale:"+scale,10,10);
}
void keyPressed(){
float inc = 0.01;
if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
if(key == 'W' && windex < window.length-1) windex++;
if(key == 'w' && windex > 0) windex--;
if(key == 'w' || key == 'W') fft.window(window[windex]);
if(keyCode == LEFT && scale > 1) scale--;
if(keyCode == RIGHT) scale++;
}
I'm not sure I've applied the hints as intended. Here's how my output looks:
but I don't think I'm there yet if I compare this with visualisations I'm aiming for:
spectrum in windows media player
spectrum in VLC player
I'm not sure I've applied the log scale correctly. My assumptions was, that I would a plot similar to what I'm aiming for after using log10 (ignoring smoothing for now).
UPDATE 4:
import ddf.minim.analysis.*;
import ddf.minim.*;
Minim minim;
AudioInput in;
FFT fft;
float smoothing = 0;
float[] fftReal;
float[] fftImag;
float[] fftSmooth;
float[] fftPrev;
float[] fftCurr;
int specSize;
WindowFunction[] window = {FFT.NONE,FFT.HAMMING,FFT.HANN,FFT.COSINE,FFT.TRIANGULAR,FFT.BARTLETT,FFT.BARTLETTHANN,FFT.LANCZOS,FFT.BLACKMAN,FFT.GAUSS};
String[] wlabel = {"NONE","HAMMING","HANN","COSINE","TRIANGULAR","BARTLETT","BARTLETTHANN","LANCZOS","BLACKMAN","GAUSS"};
int windex = 0;
int scale = 10;
void setup(){
minim = new Minim(this);
in = minim.getLineIn(Minim.STEREO, 512);
fft = new FFT(in.bufferSize(), in.sampleRate());
fft.window(window[windex]);
specSize = fft.specSize();
fftSmooth = new float[specSize];
fftPrev = new float[specSize];
fftCurr = new float[specSize];
size(specSize, specSize/2);
colorMode(HSB,specSize,100,100);
}
void draw(){
background(0);
stroke(255);
fft.forward( in.mix);
fftReal = fft.getSpectrumReal();
fftImag = fft.getSpectrumImaginary();
for(int i = 0; i < specSize; i++)
{
float maxVal = Math.max(Math.abs(fftReal[i]), Math.abs(fftImag[i]));
if (maxVal != 0.0f) { // prevent divide-by-zero
// Normalize
fftReal[i] = fftReal[i] / maxVal;
fftImag[i] = fftImag[i] / maxVal;
}
fftCurr[i] = -scale * (float)Math.log10(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);
fftSmooth[i] = smoothing * fftSmooth[i] + ((1 - smoothing) * fftCurr[i]);
stroke(i,100,100);
line( i, height/2, i, height/2 - (mousePressed ? fftSmooth[i] : fftCurr[i]));
}
text("smoothing: " + (int)(smoothing*100)+"\nwindow:"+wlabel[windex]+"\nscale:"+scale,10,10);
}
void keyPressed(){
float inc = 0.01;
if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
if(key == 'W' && windex < window.length-1) windex++;
if(key == 'w' && windex > 0) windex--;
if(key == 'w' || key == 'W') fft.window(window[windex]);
if(keyCode == LEFT && scale > 1) scale--;
if(keyCode == RIGHT) scale++;
}
produces this:
In the draw loop I'm drawing from the centre since scale is now negative. If I scale the values up the result starts to look random.
UPDATE6
import ddf.minim.analysis.*;
import ddf.minim.*;
Minim minim;
AudioInput in;
FFT fft;
float smoothing = 0;
float[] fftReal;
float[] fftImag;
float[] fftSmooth;
float[] fftPrev;
float[] fftCurr;
int specSize;
WindowFunction[] window = {FFT.NONE,FFT.HAMMING,FFT.HANN,FFT.COSINE,FFT.TRIANGULAR,FFT.BARTLETT,FFT.BARTLETTHANN,FFT.LANCZOS,FFT.BLACKMAN,FFT.GAUSS};
String[] wlabel = {"NONE","HAMMING","HANN","COSINE","TRIANGULAR","BARTLETT","BARTLETTHANN","LANCZOS","BLACKMAN","GAUSS"};
int windex = 0;
int scale = 10;
void setup(){
minim = new Minim(this);
in = minim.getLineIn(Minim.STEREO, 512);
fft = new FFT(in.bufferSize(), in.sampleRate());
fft.window(window[windex]);
specSize = fft.specSize();
fftSmooth = new float[specSize];
fftPrev = new float[specSize];
fftCurr = new float[specSize];
size(specSize, specSize/2);
colorMode(HSB,specSize,100,100);
}
void draw(){
background(0);
stroke(255);
fft.forward( in.mix);
fftReal = fft.getSpectrumReal();
fftImag = fft.getSpectrumImaginary();
for(int i = 0; i < specSize; i++)
{
fftCurr[i] = scale * (float)Math.log10(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);
fftSmooth[i] = smoothing * fftSmooth[i] + ((1 - smoothing) * fftCurr[i]);
stroke(i,100,100);
line( i, height/2, i, height/2 - (mousePressed ? fftSmooth[i] : fftCurr[i]));
}
text("smoothing: " + (int)(smoothing*100)+"\nwindow:"+wlabel[windex]+"\nscale:"+scale,10,10);
}
void keyPressed(){
float inc = 0.01;
if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
if(key == 'W' && windex < window.length-1) windex++;
if(key == 'w' && windex > 0) windex--;
if(key == 'w' || key == 'W') fft.window(window[windex]);
if(keyCode == LEFT && scale > 1) scale--;
if(keyCode == RIGHT) scale++;
if(key == 's') saveFrame("fftmod.png");
}
This feels so close:
This looks much better than the previous version, but some values on the lower/left side of the spectrum look a bit off and the shape seems to change very fast. (smoothed values plot zeroes)
I'm a little unclear on exactly what kind of smoothing you want to do, but I will try to provide some information that might help you.
Scaling FFT results for display
Generally, when you take the Fourier transform and you want to display a graph of it, you need (as you mention) to scale it logarithmically. This is because the magnitude of the values will vary over a huge range - many orders of magnitude - and compressing this into the small space observable on a graph will cause the main peaks to dwarf the rest of the information.
To actually do this scaling, we convert the values to decibels. It is important to note that decibels is a scale and not a unit - it represents a ratio between two numbers: usually a measured value and some reference. The general formula for decibels is
where
log10
is logarithm to base 10,^
is the power operator, andx_ref
is your chosen reference value. Since FFT'd values from an audio file don't (usually) have any meaningful units,x_ref
is commonly chosen to be simply1
for this application. Further, sincex
is complex, you need to take the absolute value. So the formula will beThere is a small (numerical and speed) optimisation possible here, since you're squaring the result of a square-root:
Perceptual weighting
Scaling of frequency-domain measurements is commonly done when measuring sound pressure and power levels: a specific measurement type is chosen for the given application (I won't go into the types here), and a recording of sound is made according to this measurement type. The result is FFT'd and then multiplied by a given weighting at each frequency depending on what the result will be used for and what type of sound has been recorded. There are two weightings in common use: A, and C. C is generally used only for extremely high amplitude sounds.
Note that this kind of weighting is not really necessary if you just want to display a nice-looking graph: it is used to make sure everyone in the world can make measurements (and measurement equipment) that follow the same standard. If you do decide to include this, it must be performed as a multiplication before conversion to decibels (or as an addition of the decibel value of the weighting - which is mathematically equivalent).
Info on the A-weighting is on wikipedia.
Windowing
Windowing is performed primarily to reduce the effect of the Gibbs phenomenon. We can never get rid of it completely but windowing does help. Unfortunately it has other effects: sharp peaks are broadened and "side-lobes" introduced; there is always a compromise between peak sharpness and side-lobe height. I am not going to go into all the details here unless you specifically ask for it; there is a fairly lengthy explanation of windowing in this free online book.
Time-domain smoothing of individual frequency bins
As for making the line in each frequency bin decay slowly, here's a simple idea that might do the trick: in each frequency bin, apply a simple exponential moving average. Say your FFT results are stored in
X[k]
, wherek
is the frequency index. Let your display value beY[k]
such thatwhere
0 < alpha < 1
is your smoothing factor, andY_(t-1)[k]
is the value ofY[k]
at the last time step (t-1
). This is actually a simple low-pass IIR (infinite impulse response) filter, and hopefully should do basically what you want (perhaps with a little tweaking). The closer alpha is to zero, the more quickly new observations (inputX[k]
) will affect the result. The closer it is to one, the more slowly the result will decay, but the input will also affect the result more slowly, so it may appear "sluggish". You may want to add a conditional around it to take the new value immediately if it's higher than the current value.Note that, again, this should be performed prior to conversion to decibels.
(edit) Having looked at the code you posted a little more clearly, this does appear to be the method used in the example you're trying to reproduce. Your initial attempt was close, but note that the first term is the smoothing coefficient multiplied by the last display value, not the current input.
(edit 2) Your third update is, again, close, but there is a slight mistranslation of the formula in the following lines
Instead of the previous value of the FFT coefficients before smoothing, you want to take the value after smoothing. (note that this means you don't actually need another array to store the previous value)
If
smoothing == 0
, this line should have little effect other than to multiply the result by a scalar.Normalization in the absolute value computation
Looking more closely at the way they compute the absolute value, they have a normalization in there, so that whichever of the two complex values is the maximum, becomes 1, and the other is scaled accordingly. This means you will always get an absolute value between 0 and 1, and is probably their alternative to decibel conversion. Really, this is not quite what the documentation of their
abs
function suggests, which is a little annoying... but anyway, if you replicate this it will guarantee that your values are always in a sensible range.To do this simply in your code, you could do something like
Putting it all together: Some code
Having messed around with it for a while in Processing 2.1, I have a solution that I believe you will be happy with:
The above uses a slightly different approach - averaging the spectrum in a series of bins that is smaller than the total spectrum size - that produces a result closer to WMP's than your original.
Enhancement: Now with A-weighting
I have an updated version of the code that applies the A-weighting in each frequency band (though only when dB mode is on, because the table I had was in dB :). Turn A-weighting on for a result closer to WMP's, or off for one closer to VLC's.
There are also some minor tweaks to the way it is displayed: it is now centred in the display and it will display only up to a maximum band centre frequency.
Here's the code - enjoy!
Inside your loop: you need to add a logarithmic computation for a lg scale:
Should be changed to: