How to perform DTW on an array of MFCC coefficient

2019-04-16 11:00发布

Currently I'm working on speech recognition project in MATLAB. I've taken two voice signals and have extracted the MFCC coefficients of the same. As far as I know, I should now calculate the Euclidean distance between the two and then apply the DTW algorithm. That's why I calculated the distnace between the two and got an array of the distances. So my question is how to implement DTW on resultant array?

Here's my MATLAB code:

clear all; close all; clc;

% Define variables
Tw = 25;                % analysis frame duration (ms)
Ts = 10;                % analysis frame shift (ms)
alpha = 0.97;           % preemphasis coefficient
M = 20;                 % number of filterbank channels 
C = 12;                 % number of cepstral coefficients
L = 22;                 % cepstral sine lifter parameter
LF = 300;               % lower frequency limit (Hz)
HF = 3700;              % upper frequency limit (Hz)
wav_file = 'Play.wav';  % input audio filename
wav_file1 = 'Next.wav';


% Read speech samples, sampling rate and precision from file
[ speech, fs, nbits ] = wavread( wav_file );
[ speech1, fs, nbits ] = wavread( wav_file1 );

% Feature extraction (feature vectors as columns)
[ MFCCs, FBEs, frames ] = ...
                mfcc( speech, fs, Tw, Ts, alpha, @hamming, [LF HF], M, C+1, L );
[ MFCC1s, FBEs, frames ] = ...
                mfcc( speech1, fs, Tw, Ts, alpha, @hamming, [LF HF], M, C+1, L );

L = pdist2(MFCCs, MFCC1s, 'euclidean');

2条回答
Bombasti
2楼-- · 2019-04-16 11:36

Disclaimer: I'm not a matlab user.

I think there may be a misconception in your statement "I should now calculate the Euclidean distance between the two and then apply the DTW algorithm".

The point of using DTW is that you have to compare two series (MFCCs series for wav 1 and for wav 2), and chances are that both wavs are of different duration, so you'll end up with two sets of MFCCs vectors of different size. DTW helps you compare the two MFCCs series regardless of their sizes (see https://en.wikipedia.org/wiki/Dynamic_time_warping).

So, for example, if you have extracted, let's say, 3 MFCC feature vectors for wav 1, and 5 MFCC feature vectors for wav 2, by applying DTW you can compare them, thus effectively obtaining the difference or distance between them. You don't have to calculate distance "before" DTW, you use DTW to calculate it (in fact, I don't know how would I calculate a distance between to series of different length otherwise).

Like I said at the beginning, I'm not a matlab user, but a quick google search for "matlab dtw" pointed me to this article: https://www.mathworks.com/help/signal/ref/dtw.html, in which they refer to dtw():

  dist = dtw(x,y) stretches two vectors, x and y, onto a common set of
  instants such that dist, the sum of the Euclidean distances between
  corresponding points, is smallest
查看更多
欢心
3楼-- · 2019-04-16 11:56

I suggest using Standard Euclidean distance instead of Euclidean, because de MFCC coefficients have different ranges on each dimension.

For examples if you have the following 2 dimension vectors A(500, 4), B(504,4) and C(502,3), using the euclidean distance would results that dist(A,C)dist(A,B), because each dimension distance is normalized to it's mean. Thus, you will (504-500)/502 < (4-3)/3.5

So, for MFCC it will be better to use this normalization step for improved results.

查看更多
登录 后发表回答