It is possible to differentiate among speakers/users with the Watson-Unity-SDK, as it seems to be able to return an array that identifies which words were spoken by which speakers in a multi-person exchange, but I cannot figure out how to execute it, particularly in the case where I am sending different utterances (spoken by different people) to the Assistant service to get a response accordingly.
The code snippets for parsing Assistant's json
output/response as well as OnRecognize
and OnRecognizeSpeaker
and SpeechRecognitionResult
and SpeakerLabelsResult
are there, but how do I get Watson to return this from the server when an utterance is recognized and its intent is extracted?
Both OnRecognize
and OnRecognizeSpeaker
are used only once in the Active
property, so they are both called, but only OnRecognize
does the Speech-to-Text (transcription) and OnRecognizeSpeaker
is never fired...
public bool Active
{
get
{
return _service.IsListening;
}
set
{
if (value && !_service.IsListening)
{
_service.RecognizeModel = (string.IsNullOrEmpty(_recognizeModel) ? "en-US_BroadbandModel" : _recognizeModel);
_service.DetectSilence = true;
_service.EnableWordConfidence = true;
_service.EnableTimestamps = true;
_service.SilenceThreshold = 0.01f;
_service.MaxAlternatives = 0;
_service.EnableInterimResults = true;
_service.OnError = OnError;
_service.InactivityTimeout = -1;
_service.ProfanityFilter = false;
_service.SmartFormatting = true;
_service.SpeakerLabels = false;
_service.WordAlternativesThreshold = null;
_service.StartListening(OnRecognize, OnRecognizeSpeaker);
}
else if (!value && _service.IsListening)
{
_service.StopListening();
}
}
}
Typically, the output of Assistant (i.e. its result) is something like the following:
Response: {"intents":[{"intent":"General_Greetings","confidence":0.9962662220001222}],"entities":[],"input":{"text":"hello eva"},"output":{"generic":[{"response_type":"text","text":"Hey!"}],"text":["Hey!"],"nodes_visited":["node_1_1545671354384"],"log_messages":[]},"context":{"conversation_id":"f922f2f0-0c71-4188-9331-09975f82255a","system":{"initialized":true,"dialog_stack":[{"dialog_node":"root"}],"dialog_turn_counter":1,"dialog_request_counter":1,"_node_output_map":{"node_1_1545671354384":{"0":[0,0,1]}},"branch_exited":true,"branch_exited_reason":"completed"}}}
I have set up intents
and entities
, and this list is returned by the Assistant service, but I am not sure how to get it to also consider my entities or how to get it to respond accordingly when the STT recognizes different speakers.
I would appreciate some help, particularly how to do this via Unity scripting.