I am trying to specify keywords in Watson's Speech-To-Text Unity SDK
, but I'm unsure how to do this.
The details page doesn't show an example (see here: https://www.ibm.com/watson/developercloud/doc/speech-to-text/output.shtml),
and other forum posts are written for Java applications (see here: How to specify phonetic keywords for IBM Watson speech2text service?).
I've tried hard-coding these values in the RecognizeRequest
class created in the "Recognize" function like so, but without success:
**EDIT - this function never gets called -- **
public bool Recognize(AudioClip clip, OnRecognize callback)
{
if (clip == null)
throw new ArgumentNullException("clip");
if (callback == null)
throw new ArgumentNullException("callback");
RESTConnector connector = RESTConnector.GetConnector(SERVICE_ID, "/v1/recognize");
if (connector == null)
return false;
RecognizeRequest req = new RecognizeRequest();
req.Clip = clip;
req.Callback = callback;
req.Headers["Content-Type"] = "audio/wav";
req.Send = WaveFile.CreateWAV(clip);
if (req.Send.Length > MAX_RECOGNIZE_CLIP_SIZE)
{
Log.Error("SpeechToText", "AudioClip is too large for Recognize().");
return false;
}
req.Parameters["model"] = m_RecognizeModel;
req.Parameters["continuous"] = "false";
req.Parameters["max_alternatives"] = m_MaxAlternatives.ToString();
req.Parameters["timestamps"] = m_Timestamps ? "true" : "false";
req.Parameters["word_confidence"] = m_WordConfidence ? "true" :false";
//these "keywords" and "keywords_threshold" and "keywordsThreshold" parameters
//are just my guess for how to set these values
req.Parameters["keywords"] = new string[] {"fun", "match", "test" };
req.Parameters["keywordsThreshold"] = .2;
req.Parameters["keywords_threshold"] = .2;
//end my test insertions
req.OnResponse = OnRecognizeResponse;
return connector.Send(req);
}
but the returned SpeechRecognitionEvent
result value does not contain any keywords_result
. This is my aim. I'm trying to view the confidence for each keyword in the keywords_result object like so, but the keywords_result
object comes back as null
.
private void OnRecognize(SpeechRecognitionEvent result) {
Debug.Log("Recognizing!");
m_ResultOutput.SendData(new SpeechToTextData(result));
if (result != null && result.results.Length > 0) {
if (m_Transcript != null)
m_Transcript.text = "";
foreach (var res in result.results) {
//the res.keywords_result comes back as null
foreach (var keyword in res.keywords_result.keyword) {
string text = keyword.normalized_text;
float confidence = keyword.confidence;
Debug.Log(text + ": " + confidence);
}
}
}
}
Has anyone successfully implemented Keyword Confidence Evaluation with Watson's Speech-To-Text SDK in Unity or C#? All ideas and suggestions are welcome.
PS This is my first post :)
Turns out I needed to specify the keywords in the "SendStart" function like so:
and write some code to parse the keyword_results properly in the "ParseRecognizeResponse" function:
So that now, when OnRecognize gets passed this SpeechRecognitionEvent, I've changed the code for displaying word alternatives and their confidence score, to displaying keyword results and their confidence score, like so:
Note, using the keyword results confidence values is much more valuable than doing some hardcoded check to see if the word alternatives Watson is getting match your keywords, and then using the confidence value there. The confidence values come back much higher when checking the keyword_results.keyword[].confidence values because it's already checking against those words. That was the impetus for going through with this process and parsing the SpeechRecognitionEvent result value to properly include the keywords_result values.
For some background, I'm creating a rhythm game for children with dyslexia to learn word formation, so think Guitar Hero meets Sesame street.