mp3 recognition using Sphinx 4

2019-06-10 03:58发布

问题:

Can we use mp3 files for the voice recognition process without using wav files? or can we generate a wav file from a mp3 and then do the voice recognition without a serious impact on the accuracy? The problem is I need to minimize the load transferred through the network in my application. Will the information which is lost in the conversion be a huge factor for accuracy?

回答1:

Can we use mp3 files for the voice recognition process without using wav files?

Not directly. To be able to recognize mp3 streams, you need to use java library to read mp3 and convert to pcm stream (tritonus-mp3, lameonj). You can also invoke ffmpeg as a separate process to decode.

or can we generate a wav file from a mp3 and then do the voice recognition without a serious impact on the accuracy?

Accuracy is affected in both cases, no matter where you decode mp3 file.

The problem is I need to minimize the load transferred through the network in my application. Will the information which is lost in the conversion be a huge factor for accuracy?

It's better to use losseless codec like flac for transfer. mp3 conversion degrades ASR accuracy. Another approach would be to calculate features on the client and transfer them to the server.

mp3 recognition using Sphinx 4

问题:

回答1:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮