I am new to the area of "voice recognition" in android.
I have a requirement in my app to have "speech recognition". So i am doing my homework.
I found that
1. android SDK has support for this and it used the "google voice recognition"
So from what i understand weather we invoke the recogniser by an intent or we use the class SpeechRecogniser , the actual recognition is done at the google cloud server.
I tried sample apps using both methods and the matching rate in both case is very low\
( First of all is my finding right ? i didn't get right match for most of the words/ sentence i tried ).
Will there be any difference in output for these two methods ie launching by intent / or using the SpeechRecogniser class )
Is all the apps depending on this google technology where voice is send as sound bytes and recognized at cloud server . I saw Shazam uses a different technology but they have their own database. Are there any such other technologies used
I saw many "siri for android" . Any notes on how these applications actually work ?
Thanks a lot for your time and help.
1) you will get identical results when using either the RecognizerIntent
or SpeechRecognizer
. The main difference is in the User interaction. The RecognizerIntent
forces the user to go through a standard speech recognition procedure. With the SpeechRecognizer
you get to control how the app collects speech and when it processes it. The advantage of RecognizerIntent
is that it is easy to program and familiar to users. With SpeechRecognizer
you can implement advanced things like listening for speech in the background. You also get better error reporting.
Also, some words are easy for the recognizer to understand like "apple" but some are hard like "cumin" for various reasons. You will have to be clever with matching what google returns to implement something reliable.
2) I'm not sure what you mean by their own database. Your app will have a "database" of sorts which you are trying to match against what the user says
3) Probably a mix of natural language processing, user modeling, techniques to emulate a human dialogue. Or they are just a big bunch of hand coded rules to make them look smart. My guess it is a lot of work to try to make something believable.
Check out some of my sample code here:
https://github.com/gmilette/Say-the-Magic-Word-
Yes....you on Right track. Here is a good Artical on Speech Recognization. and i think you also find Some informatoion on this link and this is interesting for you!