I was looking at this Web Audio API demo, part of this nice book
If you look at the demo, the fft peaks fall smoothly. I'm trying to do same with Processing in Java mode using the minim library. I've looked at how this is done with the web audio api in the doFFTAnalysis() method and tried to replicate this with minim. I also tried to port how abs() works with the complex type:
/ 26.2.7/3 abs(__z): Returns the magnitude of __z.
00565 template<typename _Tp>
00566 inline _Tp
00567 __complex_abs(const complex<_Tp>& __z)
00568 {
00569 _Tp __x = __z.real();
00570 _Tp __y = __z.imag();
00571 const _Tp __s = std::max(abs(__x), abs(__y));
00572 if (__s == _Tp()) // well ...
00573 return __s;
00574 __x /= __s;
00575 __y /= __s;
00576 return __s * sqrt(__x * __x + __y * __y);
00577 }
00578
I'm currently doing a quick prototype using Processing(a java framework/library). My code looks like this:
import ddf.minim.*;
import ddf.minim.analysis.*;
private int blockSize = 512;
private Minim minim;
private AudioInput in;
private FFT mfft;
private float[] time = new float[blockSize];//time domain
private float[] real = new float[blockSize];
private float[] imag = new float[blockSize];
private float[] freq = new float[blockSize];//smoothed freq. domain
public void setup() {
minim = new Minim(this);
in = minim.getLineIn(Minim.STEREO, blockSize);
mfft = new FFT( in.bufferSize(), in.sampleRate() );
}
public void draw() {
background(255);
for (int i = 0; i < blockSize; i++) time[i] = in.left.get(i);
mfft.forward( time);
real = mfft.getSpectrumReal();
imag = mfft.getSpectrumImaginary();
final float magnitudeScale = 1.0 / mfft.specSize();
final float k = (float)mouseX/width;
for (int i = 0; i < blockSize; i++)
{
float creal = real[i];
float cimag = imag[i];
float s = Math.max(creal,cimag);
creal /= s;
cimag /= s;
float absComplex = (float)(s * Math.sqrt(creal*creal + cimag*cimag));
float scalarMagnitude = absComplex * magnitudeScale;
freq[i] = (k * mfft.getBand(i) + (1 - k) * scalarMagnitude);
line( i, height, i, height - freq[i]*8 );
}
fill(0);
text("smoothing: " + k,10,10);
}
I'm not getting errors, which is good, but I'm not getting the expected behaviour which is bad. I expected the peaks to fall slower when smoothing(k) is close 1, but as far as I can tell my code only scales the bands.
Unfortunately math and sound isn't my strong point, so I'm stabbing in the dark. How can I replicate the nice visualisation from the Web Audio API demo ?
I would be tempted to say this can be language agnostic, but using javascript for example wouldn't apply :). However, I'm happy to try any other java library that does FFT analysis.
UPDATE
I've got a simple solution for smoothing (continuously diminish values of each previous fft band if the current fft band is not higher:
import ddf.minim.analysis.*;
import ddf.minim.*;
Minim minim;
AudioInput in;
FFT fft;
float smoothing = 0;
float[] fftReal;
float[] fftImag;
float[] fftSmooth;
int specSize;
void setup(){
size(640, 360, P3D);
minim = new Minim(this);
in = minim.getLineIn(Minim.STEREO, 512);
fft = new FFT(in.bufferSize(), in.sampleRate());
specSize = fft.specSize();
fftSmooth = new float[specSize];
fftReal = new float[specSize];
colorMode(HSB,specSize,100,100);
}
void draw(){
background(0);
stroke(255);
fft.forward( in.left);
fftReal = fft.getSpectrumReal();
fftImag = fft.getSpectrumImaginary();
for(int i = 0; i < specSize; i++)
{
float band = fft.getBand(i);
fftSmooth[i] *= smoothing;
if(fftSmooth[i] < band) fftSmooth[i] = band;
stroke(i,100,50);
line( i, height, i, height - fftSmooth[i]*8 );
stroke(i,100,100);
line( i, height, i, height - band*8 );
}
text("smoothing: " + (int)(smoothing*100),10,10);
}
void keyPressed(){
float inc = 0.01;
if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
}
The faded graph is the smoothed one and the fully saturated one is the live one.
I am however still missing something, in comparison to the Web Audio API demo:
I think the Web Audio API might take into account that the medium and higher frequencies will need to be scaled to be closer to what we perceive, but I'm not sure how to tackle that.
I was trying to read more on how the RealtimeAnalyser class does this for the WebAudioAPI, but it seems FFTFrame class's doFFT
method might do the logarithmic scaling. I haven't figured out how doFFT works yet.
How can I scale a raw FFT graph with a logarithmic scale to account for perception ? My goal is to do a decent looking visualisation and my guess is i will need to:
- smooth values, otherwise elements will animate to fast/twitchy
- scale the FFT bins/bands to get better data for medium/high frequencies
- map process FFT values to visual elements (find the maximum values/bounds)
Any hints on how I can achieve this ?
UPDATE 2
I'm guessing this part does the smoothing and scaling I'm after in the Web Audio API: // Normalize so than an input sine wave at 0dBfs registers as 0dBfs (undo FFT scaling factor). const double magnitudeScale = 1.0 / DefaultFFTSize;
// A value of 0 does no averaging with the previous result. Larger values produce slower, but smoother changes.
double k = m_smoothingTimeConstant;
k = max(0.0, k);
k = min(1.0, k);
// Convert the analysis data from complex to magnitude and average with the previous result.
float* destination = magnitudeBuffer().data();
size_t n = magnitudeBuffer().size();
for (size_t i = 0; i < n; ++i) {
Complex c(realP[i], imagP[i]);
double scalarMagnitude = abs(c) * magnitudeScale;
destination[i] = float(k * destination[i] + (1 - k) * scalarMagnitude);
}
It seems the scaling is done by taking the absolute of the complex value. This post points in the same direction. I've tried using the abs of the complex number using Minim and using various window functions but it still doesn't look like what I'm aiming for(the Web Audio API demo):
import ddf.minim.analysis.*;
import ddf.minim.*;
Minim minim;
AudioInput in;
FFT fft;
float smoothing = 0;
float[] fftReal;
float[] fftImag;
float[] fftSmooth;
int specSize;
WindowFunction[] window = {FFT.NONE,FFT.HAMMING,FFT.HANN,FFT.COSINE,FFT.TRIANGULAR,FFT.BARTLETT,FFT.BARTLETTHANN,FFT.LANCZOS,FFT.BLACKMAN,FFT.GAUSS};
String[] wlabel = {"NONE","HAMMING","HANN","COSINE","TRIANGULAR","BARTLETT","BARTLETTHANN","LANCZOS","BLACKMAN","GAUSS"};
int windex = 0;
void setup(){
size(640, 360, P3D);
minim = new Minim(this);
in = minim.getLineIn(Minim.STEREO, 512);
fft = new FFT(in.bufferSize(), in.sampleRate());
fft.window(window[windex]);
specSize = fft.specSize();
fftSmooth = new float[specSize];
fftReal = new float[specSize];
colorMode(HSB,specSize,100,100);
}
void draw(){
background(0);
stroke(255);
fft.forward( in.mix);
fftReal = fft.getSpectrumReal();
fftImag = fft.getSpectrumImaginary();
for(int i = 0; i < specSize; i++)
{
float band = fft.getBand(i);
//Sw = abs(Sw(1:(1+N/2))); %# abs is sqrt(real^2 + imag^2)
float abs = sqrt(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);
fftSmooth[i] *= smoothing;
if(fftSmooth[i] < abs) fftSmooth[i] = abs;
stroke(i,100,50);
line( i, height, i, height - fftSmooth[i]*8 );
stroke(i,100,100);
line( i, height, i, height - band*8 );
}
text("smoothing: " + (int)(smoothing*100)+"\nwindow:"+wlabel[windex],10,10);
}
void keyPressed(){
float inc = 0.01;
if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
if(key == 'W' && windex < window.length-1) windex++;
if(key == 'w' && windex > 0) windex--;
if(key == 'w' || key == 'W') fft.window(window[windex]);
}
I'm not sure I'm using the window functions correctly because I don't notice a huge difference between them. Is the abs of the complex value correct ? How can I get a visualisation closer to my aim ?
UPDATE 3
I've tried to apply @wakjah's helpful tips like so:
import ddf.minim.analysis.*;
import ddf.minim.*;
Minim minim;
AudioInput in;
FFT fft;
float smoothing = 0;
float[] fftReal;
float[] fftImag;
float[] fftSmooth;
float[] fftPrev;
float[] fftCurr;
int specSize;
WindowFunction[] window = {FFT.NONE,FFT.HAMMING,FFT.HANN,FFT.COSINE,FFT.TRIANGULAR,FFT.BARTLETT,FFT.BARTLETTHANN,FFT.LANCZOS,FFT.BLACKMAN,FFT.GAUSS};
String[] wlabel = {"NONE","HAMMING","HANN","COSINE","TRIANGULAR","BARTLETT","BARTLETTHANN","LANCZOS","BLACKMAN","GAUSS"};
int windex = 0;
int scale = 10;
void setup(){
minim = new Minim(this);
in = minim.getLineIn(Minim.STEREO, 512);
fft = new FFT(in.bufferSize(), in.sampleRate());
fft.window(window[windex]);
specSize = fft.specSize();
fftSmooth = new float[specSize];
fftPrev = new float[specSize];
fftCurr = new float[specSize];
size(specSize, specSize/2);
colorMode(HSB,specSize,100,100);
}
void draw(){
background(0);
stroke(255);
fft.forward( in.mix);
fftReal = fft.getSpectrumReal();
fftImag = fft.getSpectrumImaginary();
for(int i = 0; i < specSize; i++)
{
//float band = fft.getBand(i);
//Sw = abs(Sw(1:(1+N/2))); %# abs is sqrt(real^2 + imag^2)
//float abs = sqrt(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);
//fftSmooth[i] *= smoothing;
//if(fftSmooth[i] < abs) fftSmooth[i] = abs;
//x_dB = 10 * log10(real(x) ^ 2 + imag(x) ^ 2);
fftCurr[i] = scale * (float)Math.log10(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);
//Y[k] = alpha * Y_(t-1)[k] + (1 - alpha) * X[k]
fftSmooth[i] = smoothing * fftPrev[i] + ((1 - smoothing) * fftCurr[i]);
fftPrev[i] = fftCurr[i];//
stroke(i,100,100);
line( i, height, i, height - fftSmooth[i]);
}
text("smoothing: " + (int)(smoothing*100)+"\nwindow:"+wlabel[windex]+"\nscale:"+scale,10,10);
}
void keyPressed(){
float inc = 0.01;
if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
if(key == 'W' && windex < window.length-1) windex++;
if(key == 'w' && windex > 0) windex--;
if(key == 'w' || key == 'W') fft.window(window[windex]);
if(keyCode == LEFT && scale > 1) scale--;
if(keyCode == RIGHT) scale++;
}
I'm not sure I've applied the hints as intended. Here's how my output looks:
but I don't think I'm there yet if I compare this with visualisations I'm aiming for:
spectrum in windows media player
spectrum in VLC player
I'm not sure I've applied the log scale correctly. My assumptions was, that I would a plot similar to what I'm aiming for after using log10 (ignoring smoothing for now).
UPDATE 4:
import ddf.minim.analysis.*;
import ddf.minim.*;
Minim minim;
AudioInput in;
FFT fft;
float smoothing = 0;
float[] fftReal;
float[] fftImag;
float[] fftSmooth;
float[] fftPrev;
float[] fftCurr;
int specSize;
WindowFunction[] window = {FFT.NONE,FFT.HAMMING,FFT.HANN,FFT.COSINE,FFT.TRIANGULAR,FFT.BARTLETT,FFT.BARTLETTHANN,FFT.LANCZOS,FFT.BLACKMAN,FFT.GAUSS};
String[] wlabel = {"NONE","HAMMING","HANN","COSINE","TRIANGULAR","BARTLETT","BARTLETTHANN","LANCZOS","BLACKMAN","GAUSS"};
int windex = 0;
int scale = 10;
void setup(){
minim = new Minim(this);
in = minim.getLineIn(Minim.STEREO, 512);
fft = new FFT(in.bufferSize(), in.sampleRate());
fft.window(window[windex]);
specSize = fft.specSize();
fftSmooth = new float[specSize];
fftPrev = new float[specSize];
fftCurr = new float[specSize];
size(specSize, specSize/2);
colorMode(HSB,specSize,100,100);
}
void draw(){
background(0);
stroke(255);
fft.forward( in.mix);
fftReal = fft.getSpectrumReal();
fftImag = fft.getSpectrumImaginary();
for(int i = 0; i < specSize; i++)
{
float maxVal = Math.max(Math.abs(fftReal[i]), Math.abs(fftImag[i]));
if (maxVal != 0.0f) { // prevent divide-by-zero
// Normalize
fftReal[i] = fftReal[i] / maxVal;
fftImag[i] = fftImag[i] / maxVal;
}
fftCurr[i] = -scale * (float)Math.log10(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);
fftSmooth[i] = smoothing * fftSmooth[i] + ((1 - smoothing) * fftCurr[i]);
stroke(i,100,100);
line( i, height/2, i, height/2 - (mousePressed ? fftSmooth[i] : fftCurr[i]));
}
text("smoothing: " + (int)(smoothing*100)+"\nwindow:"+wlabel[windex]+"\nscale:"+scale,10,10);
}
void keyPressed(){
float inc = 0.01;
if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
if(key == 'W' && windex < window.length-1) windex++;
if(key == 'w' && windex > 0) windex--;
if(key == 'w' || key == 'W') fft.window(window[windex]);
if(keyCode == LEFT && scale > 1) scale--;
if(keyCode == RIGHT) scale++;
}
produces this:
In the draw loop I'm drawing from the centre since scale is now negative. If I scale the values up the result starts to look random.
UPDATE6
import ddf.minim.analysis.*;
import ddf.minim.*;
Minim minim;
AudioInput in;
FFT fft;
float smoothing = 0;
float[] fftReal;
float[] fftImag;
float[] fftSmooth;
float[] fftPrev;
float[] fftCurr;
int specSize;
WindowFunction[] window = {FFT.NONE,FFT.HAMMING,FFT.HANN,FFT.COSINE,FFT.TRIANGULAR,FFT.BARTLETT,FFT.BARTLETTHANN,FFT.LANCZOS,FFT.BLACKMAN,FFT.GAUSS};
String[] wlabel = {"NONE","HAMMING","HANN","COSINE","TRIANGULAR","BARTLETT","BARTLETTHANN","LANCZOS","BLACKMAN","GAUSS"};
int windex = 0;
int scale = 10;
void setup(){
minim = new Minim(this);
in = minim.getLineIn(Minim.STEREO, 512);
fft = new FFT(in.bufferSize(), in.sampleRate());
fft.window(window[windex]);
specSize = fft.specSize();
fftSmooth = new float[specSize];
fftPrev = new float[specSize];
fftCurr = new float[specSize];
size(specSize, specSize/2);
colorMode(HSB,specSize,100,100);
}
void draw(){
background(0);
stroke(255);
fft.forward( in.mix);
fftReal = fft.getSpectrumReal();
fftImag = fft.getSpectrumImaginary();
for(int i = 0; i < specSize; i++)
{
fftCurr[i] = scale * (float)Math.log10(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);
fftSmooth[i] = smoothing * fftSmooth[i] + ((1 - smoothing) * fftCurr[i]);
stroke(i,100,100);
line( i, height/2, i, height/2 - (mousePressed ? fftSmooth[i] : fftCurr[i]));
}
text("smoothing: " + (int)(smoothing*100)+"\nwindow:"+wlabel[windex]+"\nscale:"+scale,10,10);
}
void keyPressed(){
float inc = 0.01;
if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
if(key == 'W' && windex < window.length-1) windex++;
if(key == 'w' && windex > 0) windex--;
if(key == 'w' || key == 'W') fft.window(window[windex]);
if(keyCode == LEFT && scale > 1) scale--;
if(keyCode == RIGHT) scale++;
if(key == 's') saveFrame("fftmod.png");
}
This feels so close:
This looks much better than the previous version, but some values on the lower/left side of the spectrum look a bit off and the shape seems to change very fast. (smoothed values plot zeroes)