Matching two series of Mfcc coefficients

2020-07-25 17:31发布

问题:

I have extracted two series MFCC coefficients from two around 30 second audio files consisting of the same speech content. The audio files are recorded at the same location from different sources. An estimation should be made whether the audio contains the same conversation or a different conversation. Currently I have tested a correlation calculation of the two Mfcc series but the result is not very reasonable. Are there best practices for this scenario?

回答1:

I had the same problem and the solution for it was to match the two arrays of MFCCs using the Dynamic Time Warping algorithm.

After computing the MFCCs you should now have, for each of your two signals, an array where each element contains the MFCCs for a frame (an array of arrays). The first step would be to compute "distances" between every one element of one array and every one element of the other, i.e. distances between every two sets of MFCCs (you could try using the Euclidian Distance for this).

This should leave you with a 2-dimensional array (let's call it "dist") where element (i,j) represents the distance between the MFCCs of the i-th frame in the first signal and the MFCCs of the j-th frame of your second signal.

On this array you can now apply the DTW algorithm:

  • dtw(1,1) = dist(1,1)
  • dtw(i,j) = min (dtw(i-1, j-1), dtw(i-1, j), dtw(i, j-1)) + dist(i,j).

The value representing the "difference" between your two files is dtw(n,m), where n = nr. of frames in the first signal, m = nr. of frames of the second one.

For further reading, this paper might give you an overall view of applying DTW to MFCCs and this presentation of the DTW algorithm might also help.



回答2:

Since the two vectors are effectively histograms, you might want to try calculating the chi-squared distance between the vectors (a common distance measure for histograms).

d(i) = sum (x(i) - y(i))^2/(2 * (x(i)+y(i)));

A good (mex) implementation can be found in this toolbox:

http://www.mathworks.com/matlabcentral/fileexchange/15935-computing-pairwise-distances-and-metrics

Call as follows:

d = slmetric_pw(X, Y, 'chisq');


回答3:

I faced the same problem recently. The best way I found is to use the audio library MIRtoolbox, which is very powerful in terms of audio processing.

After adding this library, the distance of two MFCCs can be easily computed by calling (lower distance <=> similar matches):

dist = mirgetdata(mirdist(mfcc1, mfcc2));


回答4:

I know the question is here for almost 10 years, but I was searching for the same thing now and I personally found the above suggestions to be too complicated. For others who is still searching you can start with simply using scipy to get distance between two matrices with your mfcc data:

>>> from scipy.spatial import minkowski_distance
>>> a = [[-2.231413e+01,-5.495589e+01,-2.177988e+01,-1.719458e+01,-1.513321e+01,1.324277e+01,-9.265136e-01,1.542478e+01,1.007597e+01,7.356851e-01,1.106412e+01,-9.447377e+00,-1.325694e+00 ],[-2.294377e+01,-5.487790e+01,-2.152807e+01,-1.725173e+01,-1.500316e+01,1.287956e+01,-7.995839e-01,1.540848e+01,1.040512e+01,3.215451e-01,1.113061e+01,-9.390820e+00,-1.065433e+00 ], [-2.251059e+01,-5.475804e+01,-2.188462e+01,-1.709198e+01,-1.516142e+01,1.278525e+01,-7.952995e-01,1.602424e+01,9.981795e+00,4.940354e-01,1.081703e+01,-9.485857e+00,-7.487018e-01 ]]
>>> b = [[-2.231413e+01,-5.495589e+01,-2.177988e+01,-1.719458e+01,-1.513321e+01,1.324277e+01,-9.265136e-01,1.542478e+01,1.007597e+01,7.356851e-01,1.106412e+01,-9.447377e+00,-1.325694e+00 ], [-2.294327e+01,-5.488413e+01,-2.152952e+01,-1.724601e+01,-1.500094e+01,1.287461e+01,-8.023301e-01,1.541246e+01,1.040808e+01,3.185866e-01,1.112774e+01,-9.388848e+00,-1.062943e+00], [-2.250507e+01,-5.481581e+01,-2.189883e+01,-1.704281e+01,-1.514221e+01,1.274256e+01,-8.183736e-01,1.606115e+01,1.000806e+01,4.662135e-01,1.079070e+01,-9.468561e+00,-7.260294e-01 ]]
>>> minkowski_distance(a, b)
array([0.        , 0.01274899, 0.11421053])

https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.minkowski_distance.html

To get the detailed MFCC data I was using yaafe (packaged in Docker container): http://yaafe.github.io/Yaafe/manual/install.html

This is how to workaround the installation issue: https://github.com/Yaafe/Yaafe/issues/52