How to determine position of recognized words of S

2019-04-14 16:59发布

I am exploring the SpeechRecognitionEngine's capabilities, and my end goal is to input a WAV file and a transcription of that WAV file, and to output the positions in the WAV file of the beginning (and ideally, end) of each word.

I can get the engine to recognize the phrase successfully, but I can not understand how to retrieve the audio positions when the word starts, not when the recognition was hypothesized or recognized, etc.

If you're curious what the point of this is, it is in automating lipsync animation workflows.

Thanks for your time.

1条回答
在下西门庆
2楼-- · 2019-04-14 17:49

Proper audio to text alignment is a task which requires specific algorithms different from the speech recognition. You can emulate some alignment functionality with ASR engine, but it will work good.

For the implementations of the alignment algorithms you can check CMUSphinx speech recognition toolkit:

http://cmusphinx.sourceforge.net/?s=long+audio+alignment

http://www.bluevincent.com/2011/02/speech-to-text-using-java.html

Or you can try commercial company service like the one from Nexiwave

http://nexiwave.com/index.php/applications/transcription-timestamping

查看更多
登录 后发表回答