I am working with the HTK
toolkit on a word spotting task and have a classic training and testing data mismatch. The training data consisted of only "clean" (recorded over a mic) data. The data was converted to MFCC_E_D_A
parameters which were then modelled by HMMs (phone-level). My test data has been recorded over landline and mobile phone channels (inviting distortions and the like). Using the MFCC_E_D_A
parameters with HVite
results in incorrect output. I want to make use of cepstral mean normalization
with MFCC_E_D_A_Z
parameters but it would not be of much use since the HMMs are not modelled with this data. My questions are as follows:
- Is there any way by which I can convert
MFCC_E_D_A_Z
intoMFCC_E_D_A
? That way I follow this way:input -> MFCC_E_D_A_Z -> MFCC_E_D_A -> HMM log likelihood computation
. - Is there any way to convert the existing HMMs which model
MFCC_E_D_A
parameters intoMFCC_E_D_A_Z
?
If there is a way to do (1) from above, what would the config file for HCopy
look like? I wrote the following HCopy
config file for conversion:
SOURCEFORMAT = MFCC_E_D_A_Z
TARGETKIND = MFCC_E_D_A
TARGETRATE = 100000.0
SAVECOMPRESSED = T
SAVEWITHCRC = T
WINDOWSIZE = 250000.0
USEHAMMING = T
PREEMCOEF = 0.97
NUMCHANS = 26
CEPLIFTER = 22
NUMCEPS = 12
ENORMALISE = T
This does not work. How can I improve this?