Assistant Entities and Different Speakers

It is possible to differentiate among speakers/users with the Watson-Unity-SDK, as it seems to be able to return an array that identifies which words were spoken by which speakers in a multi-person exchange, but I cannot figure out how to execute it, particularly in the case where I am sending different utterances (spoken by different people) to the Assistant service to get a response accordingly.

The code snippets for parsing Assistant's json output/response as well as OnRecognize and OnRecognizeSpeaker and SpeechRecognitionResult and SpeakerLabelsResult are there, but how do I get Watson to return this from the server when an utterance is recognized and its intent is extracted?

Both OnRecognize and OnRecognizeSpeaker are used only once in the Active property, so they are both called, but only OnRecognize does the Speech-to-Text (transcription) and OnRecognizeSpeaker is never fired...

public bool Active
    {
        get
        {
            return _service.IsListening;
        }
        set
        {
            if (value && !_service.IsListening)
            {
                _service.RecognizeModel = (string.IsNullOrEmpty(_recognizeModel) ? "en-US_BroadbandModel" : _recognizeModel);
                _service.DetectSilence = true;
                _service.EnableWordConfidence = true;
                _service.EnableTimestamps = true;
                _service.SilenceThreshold = 0.01f;
                _service.MaxAlternatives = 0;
                _service.EnableInterimResults = true;
                _service.OnError = OnError;
                _service.InactivityTimeout = -1;
                _service.ProfanityFilter = false;
                _service.SmartFormatting = true;
                _service.SpeakerLabels = false;
                _service.WordAlternativesThreshold = null;
                _service.StartListening(OnRecognize, OnRecognizeSpeaker);
            }
            else if (!value && _service.IsListening)
            {
                _service.StopListening();
            }
        }
    }

Typically, the output of Assistant (i.e. its result) is something like the following:

Response: {"intents":[{"intent":"General_Greetings","confidence":0.9962662220001222}],"entities":[],"input":{"text":"hello eva"},"output":{"generic":[{"response_type":"text","text":"Hey!"}],"text":["Hey!"],"nodes_visited":["node_1_1545671354384"],"log_messages":[]},"context":{"conversation_id":"f922f2f0-0c71-4188-9331-09975f82255a","system":{"initialized":true,"dialog_stack":[{"dialog_node":"root"}],"dialog_turn_counter":1,"dialog_request_counter":1,"_node_output_map":{"node_1_1545671354384":{"0":[0,0,1]}},"branch_exited":true,"branch_exited_reason":"completed"}}}

I have set up intents and entities, and this list is returned by the Assistant service, but I am not sure how to get it to also consider my entities or how to get it to respond accordingly when the STT recognizes different speakers.

I would appreciate some help, particularly how to do this via Unity scripting.

回答1:

I had the exact same question about dealing with the Assistant's messages, so I looked at the Assistant.OnMessage() method that returns a string like “Response: {0}”, customData[“json”].ToString() plus the JSON output that will be something like this:

[Assistant.OnMessage()][DEBUG] Response: {“intents”:[{“intent”:”General_Greetings”,”confidence”:1}],”entities”:[],”input”:{“text”:”hello”},”output”:{“text”:[“good evening”],”nodes_visited”: etc...}

I personally parse the JSON in order to extract the content from messageResponse.Entities. In the above example, you can see that that the array is empty, but if you are populating it, then that’s where you need to extract the values from and then in your code you can do what you want.

Regarding the different speaker recognition, in the Active property whose code you have included, the _service.StartListening(OnRecognize, OnRecognizeSpeaker) line takes care of both, so perhaps put some Debug.Log statements inside their code blocks to see if they are called or not.

回答2:

Please set SpeakerLabels to True