-->

Simple word detector using MFCC

2019-09-07 21:47发布

问题:

I am implementing a software for speech recognition using Mel Frequency Cepstrum Coefficients. In particular the system must recognize a single specified word. Since the audio file I get the MFCCs in a matrix with 12 rows(the MFCCs) and as many columns as the number of voice frames. I make the average of the rows, so I get a vector with only the 12 rows (the ith-row is the average of all ith-MFCCs of all frames). My question is how to train a classifier to detect the word? I have a training set with only positive samples, the MFCCs that i get from several audio file (several registration of the same word).

回答1:

I make the average of the rows, so I get a vector with only the 12 rows (the ith-row is the average of all ith-MFCCs of all frames).

This is a very bad idea because you lose all information about the word, you need to analyze the whole mfcc sequence, not a part of it

My question is how to train a classifier to detect the word?

The simple form would be a GMM classifier, you can check here:

http://www.mathworks.com/company/newsletters/articles/developing-an-isolated-word-recognition-system-in-matlab.html

In more complex form you need to learn more complex model like HMM. You can learn more about HMM from textbook like this one

http://www.amazon.com/Fundamentals-Speech-Recognition-Lawrence-Rabiner/dp/0130151572