So, like many others I decided to create my own speech-recognition engine. As it turned out, it's not easy at all, instead, it's rather difficult to accomplish for English language particularly, because there is, I'd say, dramatical difference between the way a word is written, and the way it's pronounced. Being from Georgia, I decided to write speech-recognition for Georgian language. In Georgian, you pronounce words EXACTLY the way you write them. It's just like a transcription. Will this fact significantly ease my task? Or there are even more difficult... difficulties :D ?
相关问题
- Can we recover audio from MFCC coefficients?
- Speech recognition not working well
- Web speech API grammar
- Speech Recognition on Unity Game Engine
- ImportError: No module named SpeechRecognition
相关文章
- How to embed Google Speech to Text API in Python p
- Error Domain=kAFAssistantErrorDomain Code=209 “(nu
- portaudio.h: No such file or directory
- How to simultaneously read audio samples while rec
- Android: Arabic speech recognition - offline
- UWP speech recognition failure requires restart wi
- How to Auto stop speech recognition if user stop s
- Managing text-to-speech and speech recognition at
Speech recognition is a complex domain with many specific algorithms, tools and methods. To create your own engine you could start with CMUSphinx open source speech recognition toolkit which will allow you to:
CMUSphinx already supports English, German, Spanish, French, Dutch, Russian, Mandarin, Icelandic, Italian and many other languages. It's very simple to add a new one. For new people it usually takes a month or two of concentrated work to implement the required process.
To get started visit the homepage:
http://cmusphinx.sourceforge.net
and read the tutorial
http://cmusphinx.sourceforge.net/wiki/tutorial
If you have any question, please ask them on forums or here!
And, it's a very common misconception that you just spell the sounds when you speak Georgian. It's not true for most of the languages in the world. To test the hypothesis try to record some audio in an audio editor and check which sounds are actually pronounced. You'll be surprised. Tutorial above covers this question in details.
Do all people from Georgia sound absolutely the same ? I think not... lots of major problems in speech recognition are not directly related to the language itself:
etc.
Solving these things always is pretty hard... on top of that you have the language/pronounciation to take care of... I don't know Georgian but what you describe might make the task a bit easier but it will still be a hard task.
EDIT - as per comments:
Using good libraries might lower the time-frame and even help in quality... but not every library is good for speech recognition despite perhaps being brilliant on some other audio-related matters...
For reference see the Wikipedia article http://en.wikipedia.org/wiki/Speech_recognition - it has a good overview including some links and book references which are a good starting point...
As for how to design such an API see for example http://java.sun.com/products/java-media/speech/forDevelopers/jsapi-guide/Recognition.html