It is possible to differentiate among speakers/users with the Watson-Unity-SDK, as it seems to be able to return an array that identifies which words were spoken by which speakers in a multi-person exchange, but I cannot figure out how to execute it, particularly in the case where I am sending different utterances (spoken by different people) to the Assistant service to get a response accordingly.
The code snippets for parsing Assistant's json
output/response as well as OnRecognize
and OnRecognizeSpeaker
and SpeechRecognitionResult
and SpeakerLabelsResult
are there, but how do I get Watson to return this from the server when an utterance is recognized and its intent is extracted?
Both OnRecognize
and OnRecognizeSpeaker
are used only once in the Active
property, so they are both called, but only OnRecognize
does the Speech-to-Text (transcription) and OnRecognizeSpeaker
is never fired...
public bool Active
{
get
{
return _service.IsListening;
}
set
{
if (value && !_service.IsListening)
{
_service.RecognizeModel = (string.IsNullOrEmpty(_recognizeModel) ? "en-US_BroadbandModel" : _recognizeModel);
_service.DetectSilence = true;
_service.EnableWordConfidence = true;
_service.EnableTimestamps = true;
_service.SilenceThreshold = 0.01f;
_service.MaxAlternatives = 0;
_service.EnableInterimResults = true;
_service.OnError = OnError;
_service.InactivityTimeout = -1;
_service.ProfanityFilter = false;
_service.SmartFormatting = true;
_service.SpeakerLabels = false;
_service.WordAlternativesThreshold = null;
_service.StartListening(OnRecognize, OnRecognizeSpeaker);
}
else if (!value && _service.IsListening)
{
_service.StopListening();
}
}
}
Typically, the output of Assistant (i.e. its result) is something like the following:
Response: {"intents":[{"intent":"General_Greetings","confidence":0.9962662220001222}],"entities":[],"input":{"text":"hello eva"},"output":{"generic":[{"response_type":"text","text":"Hey!"}],"text":["Hey!"],"nodes_visited":["node_1_1545671354384"],"log_messages":[]},"context":{"conversation_id":"f922f2f0-0c71-4188-9331-09975f82255a","system":{"initialized":true,"dialog_stack":[{"dialog_node":"root"}],"dialog_turn_counter":1,"dialog_request_counter":1,"_node_output_map":{"node_1_1545671354384":{"0":[0,0,1]}},"branch_exited":true,"branch_exited_reason":"completed"}}}
I have set up intents
and entities
, and this list is returned by the Assistant service, but I am not sure how to get it to also consider my entities or how to get it to respond accordingly when the STT recognizes different speakers.
I would appreciate some help, particularly how to do this via Unity scripting.
I had the exact same question about dealing with the Assistant's messages, so I looked at the
Assistant.OnMessage()
method that returns a string like“Response: {0}”, customData[“json”].ToString()
plus theJSON
output that will be something like this:I personally parse the
JSON
in order to extract the content frommessageResponse.Entities
. In the above example, you can see that that the array is empty, but if you are populating it, then that’s where you need to extract the values from and then in your code you can do what you want.Regarding the different speaker recognition, in the
Active
property whose code you have included, the_service.StartListening(OnRecognize, OnRecognizeSpeaker)
line takes care of both, so perhaps put someDebug.Log
statements inside their code blocks to see if they are called or not.Please set
SpeakerLabels
toTrue