How to recognize a phrase from a voice file

2020-02-11 06:24发布

问题:

How to get the engine to successfully recognize a phrase from a voice file (wav/mp3/etc..)?

For example, if I'll have a voice file and a written text of the context of the same file, so to make it recognize the written words in the voice file.

I tried to play around with the SpeechRecognitionEngine, but without success so far.

I'll appreciate ideas, since this is my first time dealing with Speech Recognition techniques.

I've seen examples of speech-to-text using dictionaries, but I'm not sure how it can be useful here. I was thinking of maybe converting the all voice file to text, and then simply look for the specific phrase in that text, but I don't think it's the right way. Doesn't seem to make sense to convert for example 5hrs voice to text.... or maybe to use the specific phrase as a "dictionary" and to look for this item in the voice file.

回答1:

It seems you need to look for a specific word in a long file. This technique is called "Keyword Spotting", it's quite different from speech recognition, way more efficient. Obviosly you do not need to transcribe the whole file to search a word in it, you can quickly scan through the file. Microsoft Speech Recognition engine have very limited support for keyword spotting.

Open source engines like CMUSphinx could be used to implement the keyword spotting efficiently. See for the further references the information on how to implement wake-up listening with pocketsphinx.

For the more information on the underlying algorithms see ACOUSTIC KEYWORD SPOTTING IN SPEECH WITH APPLICATIONs TO DATA MINING



回答2:

According to the MSDN article Getting Started with Speech Recognition.

The steps you need to do are(from article). Note the create recognition grammer step. The article goes on to suggest using the GrammerBuilder or Choices Classes.

A speech recognition application will typically perform the following basic operations:
- Start the speech recognizer.
- Create a recognition grammar.
- Load the grammar into the speech recognizer.
- Register for speech recognition event notification.
- Create a handler for the speech recognition event.



回答3:

If you are trying to convert audio files using the Microsoft speech engines, you have to use some care. First, the only format supported is WAV (it can be encoded as PCM, ALaw, or uLaw), but you must verify that your file is in a format supported by your recognizer. You also must verify the sample rate. The recognizers only support a fixed set of sample rates. On my machine,

  • 8 bits per sample
  • single channel mono
  • 22,050 samples per second
  • PCM encoding

works well. See https://stackoverflow.com/a/6203533/90236 for some more info. You may have to re-sample or re-encode the WAV files using a tool like audacity. See https://stackoverflow.com/a/9467044/90236.

A simple example to get you started is in SAPI and Windows 7 Problem.

Last, (I always repeat this point, sorry) there is a great article about programming recognition in Windows .NET. See http://msdn.microsoft.com/en-us/magazine/cc163663.aspx, it is a little out of date, but a great introduction.