How do speech recognition algorithms recognize hom

2019-07-26 05:58发布

问题:

I was pondering this question earlier. What clues do modern algorithms (specifically those that convert voice to text) use to determine which homophone was said (E.g. to, too, or two?)

Do they use contextual clues? Sentence structure? Perhaps there are slight differences in the way each word is usually pronounced (for example, I usually hold the o sound longer in two than in to). A combination of the first two seems most plausible.

回答1:

Do they use contextual clues?

Yes, ASR systems use cross-word context. For example if previous word is "going" the next word will likely to be "to" not "two". ASR systems account for probabilities and select the best probable decoding variant.

Sentence structure?

Yes, ASR systems use more advanced language models as well to predict probable words given the context.

Perhaps there are slight differences in the way each word is usually pronounced (for example, I usually hold the o sound longer in two than in to).

That too. Actually "too" and "to" are pronounced quite differently. "to" is often reduced to shwa.

If you are interested in speech recognition algorithms, it may have sense to read ASR book or check online course. See for details

https://sourceforge.net/p/cmusphinx/discussion/speech-recognition/thread/3ea89abf/

How do speech recognition algorithms recognize hom

问题:

回答1:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮