I want to ask regarding the meaning of 3-state phone model in HMM. This case is based on the theory of HMM in speech recognition system. So the example is based on the acoustic modeling of the speech sounds in HMM.
I get this example picture from a journal paper: http://www.intechopen.com/source/html/41188/media/image8_w.jpg
Figure 1: 3-State HMM for the sound /s/
So, my question is:
- what is it mean by 3 state?
- what actually S1, S2 & S3 mean? (I know it is state but it represent what?)
- How to represent the /s/ sound in this HMM state?
- Why is it 3? what happen if we have 4, 5 or more state?
- If the sound of /s/ is only a simple sound of consonant "s/", what is the used of the state and transition represent?
Do you guys have simple explanation with example (graphic analogy) of this theory?
Thank you
Nick
The model that describes the phone S consist of tree states - S1, S2 and S3.
S1 represents probability distribution of feature vector in the beginning of phone S, S2 in the middle, S3 in the end. Probability distribution is essentially most probable value of the feature vector (how does this part of the phone sounds) and the variation (in what ranges it varies).
S sounds is represented by a whole HMM, not just a single state.
In continuous speech recognition phone acoustics is affected by preceding phoneme and succeeding phoneme. For that reason its more precise to split each phone on 3 parts - transition from previous phone in the beginning, stable middle and transition to the next phone in the end. If phone would be isolated and stable 1 state would be enough. It is also possible to use 5 states for single phone in continuous speech, but it doesn't greatly improve the accuracy.
See above. Transition represents probability of moving from one state to another, essentially it models the length of the phone.