可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am new to the area of "voice recognition" in android.

I have a requirement in my app to have "speech recognition". So i am doing my homework. I found that 1. android SDK has support for this and it used the "google voice recognition" So from what i understand weather we invoke the recogniser by an intent or we use the class SpeechRecogniser , the actual recognition is done at the google cloud server. I tried sample apps using both methods and the matching rate in both case is very low\ ( First of all is my finding right ? i didn't get right match for most of the words/ sentence i tried ).

Will there be any difference in output for these two methods ie launching by intent / or using the SpeechRecogniser class )
Is all the apps depending on this google technology where voice is send as sound bytes and recognized at cloud server . I saw Shazam uses a different technology but they have their own database. Are there any such other technologies used
I saw many "siri for android" . Any notes on how these applications actually work ?

Thanks a lot for your time and help.

回答1:

1) you will get identical results when using either the RecognizerIntent or SpeechRecognizer. The main difference is in the User interaction. The RecognizerIntent forces the user to go through a standard speech recognition procedure. With the SpeechRecognizer you get to control how the app collects speech and when it processes it. The advantage of RecognizerIntent is that it is easy to program and familiar to users. With SpeechRecognizer you can implement advanced things like listening for speech in the background. You also get better error reporting.

Also, some words are easy for the recognizer to understand like "apple" but some are hard like "cumin" for various reasons. You will have to be clever with matching what google returns to implement something reliable.

2) I'm not sure what you mean by their own database. Your app will have a "database" of sorts which you are trying to match against what the user says

3) Probably a mix of natural language processing, user modeling, techniques to emulate a human dialogue. Or they are just a big bunch of hand coded rules to make them look smart. My guess it is a lot of work to try to make something believable.

Check out some of my sample code here: https://github.com/gmilette/Say-the-Magic-Word-

回答2:

Yes....you on Right track. Here is a good Artical on Speech Recognization. and i think you also find Some informatoion on this link and this is interesting for you!