MATLAB code for a lot of Gaussian Mixture Model

2019-04-13 16:03发布

I have applied gaussmix function in voicebox MATLAB tools to calculate GMM. However, the code gives me error when I run it for 512 GMM components.

No_of_Clusters = 512;
No_of_Iterations = 10;
[m_ubm1,v_ubm1,w_ubm1]=gaussmix(feature,[],No_of_Iterations,No_of_Clusters);

Error using  * 
Inner matrix dimensions must agree.

Error in gaussmix (line 256)
pk=px*wt;                       % pk(k,1) effective number of data points for each mixture (could be    zero due to underflow)

I need 1024 or 2048 Mixtures for Universal Background Model (UBM) construction. Could anyone give me matlab code to calculate GMM for big number of mixture such as 512 or 2048 ?

Thanks.

1条回答
你好瞎i
2楼-- · 2019-04-13 16:29

Do you want use it for Speech processing? If yes , the best way is use of MSR Identity Toolkit . this toolkit is written by Dr. Omid Sadjadi as Microsoft Researcher. He guided me how to use it.( also you need Voicebox too). Here is an example code snippet that you may use to extract MFCCs from speech files in wav files (assuming 16 kHz sample rate):

addpath('path_to_voicebox');
addpath('path_to_identity_toolbox');
[s, fs] = wavread(speechFilename);
fL = 100.0/fs; 
fH = 8000.0/fs; 
fRate = 0.010 * fs; 
fSize = 0.025 * fs; 
nChan = 27; 
nCeps = 12; 
premcoef = 0.97;
s = rm_dc_n_dither(s, fs); 
s = filter([1 -premcoef], 1, s); 
mfc = melcepst(s, fs, '0dD', nCeps, nChan, fSize, fRate, fL, fH);
mfc = cmvn(mfc', true);
writehtk(featureFilename, mfc', 100000, 9);

The above code extracts 39-dimensional MFCCs from pre-emphasized speech signal, and then mean and variance normalizes the features, and finally writes them to disk in HTK format. Note that this is just an example code and you may modify this code based on your needs/rescources. The two functions "rm_dc_n_dither" and "cmvn" are from the Identity Toolbox. Both Voicebox and Identity Toolbox should be in MatLab path (see the first two lines of the above code). For voice activity detection (VAD), you can use the "vadsohn" function from Voicebox that outputs frame level decisions (0 for silence and 1 for speech) at 10 ms frame skip-rate.

After you extract the features from your database, you may follow the procedures in gmm_ubm_demo provided with the Identity Toolbox to train a UBM model.

In case you would like to replicate our demo results on TIMIT, you may download the list files (not included in the toolbox) from below address:

http://www.utdallas.edu/~sadjadi/lists.tar.gz

It is very easy and you do it with normal pc .

Regards Mohammad Karaminejad karaminejad@gmail.com

查看更多
登录 后发表回答