How to Reduce Noise in Sphinx4 application

2019-03-06 00:30发布

问题:

I am new to sphinx4 and speech recognition thing. I am creating a speech application by using sphinx4. The issue is noise, which results in recognition by program even without speech input by user ie it is converting speech to text even when there is no speech input by user, thus affecting accuracy.

Main issue is how to implement noise reduction. This is because the system detects input even when I dont speak anything into the microphone. So I guess its because of noise.

I checked online for noise reduction but there is hardly clear information about the same. Though there is some information over internet about file named Denoise.java which comes with sphinx4. But it is not there in sphinx4-1.06.

Another file is WienerFilter.java, wienerfilter is type of filter used for noisy signals. But no instructions for using or implementing that file.

I have already added few more words to hello.gram which is the grammar file for the program. The phonetic representation of those extra words, generated by Imtool, has been added to the dictionary.

I am using eclipse and sphinx4-1.0beta6

Though there is one question on stack overflow on "HOW TO ACITIVATE NOISE CANCELLATION " IN SPHINX4 but that has not been answered yet

回答1:

Static noise cancellation with spectral subtraction is enabled by default in latest version sphinx4-5prealpha. You do not need to do anything special, just use the latest version.

Follow the tutorial:

http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4

Extra speech is ignored in latest version if you are using grammar decoding, not large vocabulary decoding with language model. If you use grammar decoding it should ignore all the words in the grammar. For words not in the grammar it should return a special word <unk>.

Accuracy debugging is a complex process and requires a test recording to reproduce accuracy problems. Without test recording it is hard to suggest you how to improve accuracy. Beside test recording you need to provide models you use in decoding and other information to reproduce your results.