How to determine position of recognized words of S

2019-04-14 16:59发布

I am exploring the SpeechRecognitionEngine's capabilities, and my end goal is to input a WAV file and a transcription of that WAV file, and to output the positions in the WAV file of the beginning (and ideally, end) of each word.

I can get the engine to recognize the phrase successfully, but I can not understand how to retrieve the audio positions when the word starts, not when the recognition was hypothesized or recognized, etc.

If you're curious what the point of this is, it is in automating lipsync animation workflows.

Thanks for your time.

标签： c# speech-recognition

1条回答

在下西门庆

2楼-- · 2019-04-14 17:49

Proper audio to text alignment is a task which requires specific algorithms different from the speech recognition. You can emulate some alignment functionality with ASR engine, but it will work good.

For the implementations of the alignment algorithms you can check CMUSphinx speech recognition toolkit:

http://cmusphinx.sourceforge.net/?s=long+audio+alignment

http://www.bluevincent.com/2011/02/speech-to-text-using-java.html

Or you can try commercial company service like the one from Nexiwave

http://nexiwave.com/index.php/applications/transcription-timestamping

0人赞添加讨论(0) 举报

How to determine position of recognized words of S

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间