问题:

I'm play with speech recognition. Is it possible to split speech to multiple words?

If it's possible please recommend me library supported split a speech to words.

Thanks

回答1:

If you know what the speaker has said you can perform forced alignment to generate the word (or phoneme) time alignments. Toolkits such as CMU Sphinx, HTK and Kaldi can perform this. If don't know what the speaker has said you can just perform standard speech recognition and use the time information to obtain the word boundaries, although there may be errors in the recognition output.

回答2:

Having no prior information on what phrase has been pronounced this task is pretty challenging. As one of the ways you can try applying VAD to the speech and split sound into words by pauses. But in case of spontaneous speech people often do no pases between some words. So there will be problems for sure.

Some VAD libraries are suggested here.