I am looking for an API for ios (free ideally) that will allow to do some speech recognition. I have seen few posts for this: iPhone speech recognition API? and free speech recognition engines for iOS? and after a bit of prospect i have gathered the sdk that looks quite interesting:
- http://dragonmobile.nuancemobiledeveloper.com/public/index.php?task=home
- http://www.politepix.com/openears
- http://www.creaceed.com/ceedvocalsdk/ (not free :-\ )
- http://www.ispeech.org/
is there any of those that really stand out of the crowd and quite recent? how do they really differentiate from each other?
If you want to track just few keywords, you should not look for speech recognition API or service. This task is called Keyword Spotting and it uses different algorithms than speech recognition. Speech recognition tries to find all the words that has been said and because of that it consumes way more resources than keyword spotting. Keyword spotter only tries to find few selected keywords or keyphrases. It's way simple and way less resource consuming.
The only possible solution to archive this funcitonality is to use open source package like OpenEars powered by Pocketsphinx
http://www.politepix.com/openears
Openears has Rejecto plugin that implements something similar.
Pocketsphinx itself has recently implemented open source effective keyword spotting too, but it didn't get into Openers yet. It's only available through pocketsphinx API, you need to create kws search and set the target word to look for. I hope soon this functionality will reach OpenEars too.
Nuance gives developers free access (but not for high volume) - See http://www.masshightech.com/stories/2011/09/26/daily13-Nuance-tweaks-mobile-dev-program-with-free-access-to-Dragon.html or http://dragonmobile.nuancemobiledeveloper.com/public/index.php?task=home
Nuance services are typically offered commercially and require up front fees and transaction fees. The interesting news above is that they now make low volume use of their services available to developers for free. So, for development, testing, and demonstration you can probably use the free Nuance services. However, unlike the Google services that come free in Android, if your app has thousands of users you will likely have to pay for Nuance services.
We have been developing CeedVocal SDK since 2008, it's based on Julius & FLite open source projects.
Here's some context: we wanted to make our app (Vocalia) for speech recognition back in 2008 and basically picked Julius (hesitated with Pocket Sphinx, which appears to be good as well) and optimized its file format so that it would boot in 1-2 sec instead of 20sec on the original iPhone. Then we dutifully trained our own acoustic models in 6 languages. We designed the API, and eventually decided to offer it to other developers as an SDK.
CeedVocal basically supports 2 modes of operation:
- matching of words (or small phrases)
- keyword spotting
In the first mode of operation, it tries to align the input speech to a word (or phrase) in its list of acceptable input. This forces the input to a pre-known word, even if the speech is something else. Accuracy is good. In the second mode of operation, it will try to pick one of its keywords into the stream of speech. This is a difficult case, and it can be less accurate.