I'm trying to follow this tutorial, and it crashes upon startup after having lots of problems with the dictionary and models, such as.
The dictionary is missing a phonetic transcription for the word 'humphrey'
and
Dec 18, 2014 1:14:50 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
SEVERE: Missing HMM for unit T with lc=N rc=EH1
13:14:50.601 SEVERE lexTreeLinguist Bad HMM Unit: EH1
I loaded this dictionary and got the language and acoustic models from their SourceForge page
It then crashes with this:
Exception in thread "main" java.lang.NullPointerException
at edu.cmu.sphinx.linguist.lextree.HMMNode.getBaseUnit(HMMTree.java:506)
at edu.cmu.sphinx.linguist.lextree.HMMNode.<init>(HMMTree.java:484)
at edu.cmu.sphinx.linguist.lextree.Node.addSuccessor(HMMTree.java:165)
at edu.cmu.sphinx.linguist.lextree.HMMTree$EntryPoint.createEntryPointMap(HMMTree.java:1163)
at edu.cmu.sphinx.linguist.lextree.HMMTree$EntryPointTable.createEntryPointMaps(HMMTree.java:1021)
at edu.cmu.sphinx.linguist.lextree.HMMTree.compile(HMMTree.java:795)
at edu.cmu.sphinx.linguist.lextree.HMMTree.<init>(HMMTree.java:716)
at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.generateHmmTree(LexTreeLinguist.java:433)
at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.compileGrammar(LexTreeLinguist.java:420)
at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.allocate(LexTreeLinguist.java:337)
at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.allocate(WordPruningBreadthFirstSearchManager.java:232)
at edu.cmu.sphinx.decoder.AbstractDecoder.allocate(AbstractDecoder.java:92)
at edu.cmu.sphinx.recognizer.Recognizer.allocate(Recognizer.java:167)
at edu.cmu.sphinx.api.LiveSpeechRecognizer.startRecognition(LiveSpeechRecognizer.java:46)
at com.test.sphinxtest.App.main(App.java:25)
Here's my code.
package com.test.sphinxtest;
import java.io.IOException;
import edu.cmu.sphinx.api.Configuration;
import edu.cmu.sphinx.api.LiveSpeechRecognizer;
import edu.cmu.sphinx.api.SpeechResult;
/**
* Hello world!
*
*/
public class App
{
public static void main( String[] args )
{
Configuration configuration = new Configuration();
configuration.setAcousticModelPath("models/acousticmodel/en-us");
configuration.setDictionaryPath("dictionary/cmudict-0.6d");
configuration.setLanguageModelPath("models/languagemodel/en-us.lm");
try {
LiveSpeechRecognizer recognizer = new LiveSpeechRecognizer(configuration);
recognizer.startRecognition(true);
SpeechResult result = recognizer.getResult();
recognizer.stopRecognition();
while ((result = recognizer.getResult()) != null) {
System.out.println(result.getHypothesis());
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
The correct dictionary should not have stress marks, you can download it from here:
https://raw.githubusercontent.com/cmusphinx/pocketsphinx/master/model/en-us/cmudict-en-us.dict