I have a very specific requirement. I am working on an application which will allow users to speak their employee number which is of the format HN56C12345 (any alphanumeric characters sequence) into the app. I have gone through the link: http://cmusphinx.sourceforge.net/wiki/tutoriallm but I am not sure if that would work for my usecase.
So my question is three-folds :
- Can Sphinx4 actually recognize an alphanumeric sequence with high accuracy like an emp number in my case?
- If yes, can anyone point me to a concrete example / reference page where someone has built custom language support in Sphinx4 from scratch. I haven't found a detailed step-by-step doc yet on this. Did anyone work on alphanumeric sequence based dictionaries or language models?
- How to build an acoustic model for this scenario?
You don't need a new acoustic model for this, but rather a custom grammar. See http://cmusphinx.sourceforge.net/wiki/tutoriallm#building_a_grammar and http://cmusphinx.sourceforge.net/doc/sphinx4/edu/cmu/sphinx/jsgf/JSGFGrammar.html to learn more. Sphinx4 recognizes characters just fine if you put them space-separated in the grammar:
#JSGF V1.0
grammar jsgf.emplID;
<digit> = zero | one | two | three | four | five | six | seven | eight | nine ;
<digit2> = <digit> <digit> ;
<digit4> = <digit2> <digit2> ;
<digit5> = <digit4> <digit> ;
// This rule accepts IDs of a kind: hn<2 digits>c<5 digits>.
public <id> = h n <digit2> c <digit5> ;
As to accuracy, there are two ways to increase it. If the numbers of employees isn't too large, you can just make the grammar with all possible employee IDs. If this is not your case, than to have a generic grammar is your only option. Although it's possible to make a custom scorer which will use the context information to predict the employee ID better than the generic algorithm. This way requires some knowledge in both ASR and CMU Sphinx code.