Is there any well known established framework for C or Java or PHP to do speech recognition applications? Microphone audio input and it will recognize English words. Such as pseudo code:
Speech s = new Speech();
s.input(micStream);
result = s.recognise("Hello");
if (result) { printf("Matched hello"); } else { printf("No match found"); }
Follow up:
Download this: sphinx4/1.0%20beta6/
Add the libraries
Copy & paste code:
a) xml file put somewhere, which can be loaded from the code:
https://gist.github.com/2551321
b) use this:
package edu.cmu.sphinx.demo.hellowrld; import edu.cmu.sphinx.frontend.util.Microphone; import edu.cmu.sphinx.recognizer.Recognizer; import edu.cmu.sphinx.result.Result; import edu.cmu.sphinx.util.props.ConfigurationManager; import java.io.IOException; import java.util.logging.Level; import java.util.logging.Logger; import models.Tts; public class Speech { public static void main(String[] args) { ConfigurationManager cm; if (args.length > 0) { cm = new ConfigurationManager(args[0]); } else { ///tmp/helloworld.config.xml cm = new ConfigurationManager(Speech.class.getResource("speech.config.xml")); } Recognizer recognizer = (Recognizer) cm.lookup("recognizer"); recognizer.allocate(); Microphone microphone = (Microphone) cm.lookup("microphone"); if (!microphone.startRecording()) { System.out.println("Cannot start microphone."); recognizer.deallocate(); System.exit(1); } System.out.println("Say: (Hello | call) ( Naam | Baam | Caam | Some )"); while (true) { System.out.println("Start speaking. Press Ctrl-C to quit.\n"); Result result = recognizer.recognize(); if (result != null) { String resultText = result.getBestFinalResultNoFiller(); System.out.println("You said: " + resultText + '\n'); Tts ts = new Tts(); try { ts.load(); ts.say("Did you said: " + resultText); } catch (IOException ex) { } } else { System.out.println("I can't hear what you said.\n"); } } } }
Try my C library, libsprec, which is built around Google's speech recognition engine:
http://github.com/H2CO3/libsprec
Hmm. An interesting topic. I haven't done any work around this sort of thing in ages, though I did spend quite a bit of time playing with some (fairly basic) speech recognition software on the Amiga many years ago. It's good fun, but not nearly as easy as your pseudo-code example makes it sound.
You're going to need a third party API library for this. (I guess it's possible to write your own, but I don't think you're as the point where that's a feasible idea)
There are a number of API libraries available; Google turned up several -- here's one of the results I got: http://en.wikipedia.org/wiki/Microsoft_Speech_API -- but you'll probably need to try a few till you get one which meets your needs.
The chances are it's going to be a commercial API -- ie you'll have to pay for it. There may be some open source ones (I didn't see any in my cursory Googleing, but I'm sure they exist), but they're likely to be a lot harder to use.
Once you've got a library that you're happy with, and you've written your code to interface with it, your work isn't done, because speech recognition is a notoriously tricky thing to work with.
Different accents are just the start of the problem. The gender of the speaker and the speed at which they speak also affect the ability to recognise what has been said. Humans are far better at recognising speech than computers, but even we struggle with some unfamiliar accents.
Speech recognition software typically needs to be trained to recognise specific words and phrases. You certainly wouldn't try to match against a string as in your example; you'd ask it to spot a specific one of the phrases it had been trained to recognise.
In short, it's a very big field, which you're clearly only just dipping your toe into. I hope it goes well for you, but I see a lot of research time in your immediate future!
Here are some other links which may help you:
http://www.codeproject.com/KB/vista/Vista_Speech_Recognition.aspx
http://www.lumenvox.com/products/speech_engine/
http://java.sun.com/products/java-media/speech/forDevelopers/jsapi-guide/Recognition.html
HTK is one of the more popular frameworks for C.
http://htk.eng.cam.ac.uk/
It is not easily used, but definitely is powerful.
Check this out: http://cmusphinx.sourceforge.net/
From watching these questions for few months, I've seen most developer choices break down like this:
Windows folks - use the System.Speech features of .Net or Microsoft.Speech and install the free recognizers Microsoft provides. Windows 7 includes a full speech engine. Others are downloadable for free. There is a C++ API to the same engines known as SAPI. See at http://msdn.microsoft.com/en-us/magazine/cc163663.aspx. or http://msdn.microsoft.com/en-us/library/ms723627(v=vs.85).aspx. More background on Microsoft engines for Windows What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition?
Linux folks - Sphinx seems to have a good following. See http://cmusphinx.sourceforge.net/ and http://cmusphinx.sourceforge.net/wiki/
Commercial products - Nuance, Loquendo, AT&T, others
Online service - Nuance, Yapme, others
Of course this may also be helpful - http://en.wikipedia.org/wiki/List_of_speech_recognition_software
There is a Java speech API. See javax.speech.recognition in the Java Speech API http://java.sun.com/products/java-media/speech/forDevelopers/jsapi-guide/Recognition.html. I believe you still have to find a speech engine that supports this API. I don't think Sphinx fully supports it - http://cmusphinx.sourceforge.net/sphinx4/doc/Sphinx4-faq.html#support_jsapi
There are lots of other SO quesitons: Need text to speech and speech recognition tools for Linux
The J.A.R.V.I.S. Java Speech API is very robust and functional and a great minimalist alternative to Sphinx.
https://github.com/The-Shadow/java-speech-api