I am trying to hook-up my real time audio endpoint which produces continuous audio stream with Direct Line Speech (DLS) endpoint which eventually interacts with my Azure bot api.
I have a websocket API that continuously receives audio stream in binary format and this is what I intend to forward it to the DLS endpoint for continuous Speech2Text with my bot.
Based on the feedback and answer here, I have been able to hook up my Direct Line speech endpoint with a real-time stream.
I've tried a sample wav file which correctly gets transcribed by DLS and my bot is correctly able to retrieve the text to operate on it.
I have used the ListenOnce() API and am using a PushAudioInputStream method to push the audio stream to the DLS speech endpoint.
The below code is internals of ListenOnce() method
// Create a push stream
using (var pushStream = AudioInputStream.CreatePushStream())
{
using (var audioInput = AudioConfig.FromStreamInput(pushStream))
{
// Create a new Dialog Service Connector
this.connector = new DialogServiceConnector(dialogServiceConfig, audioInput);
// ... also subscribe to events for this.connector
// Open a connection to Direct Line Speech channel
this.connector.ConnectAsync();
Debug.WriteLine("Connecting to DLS");
pushStream.Write(dataBuffer, dataBuffer.Length);
try
{
this.connector.ListenOnceAsync();
System.Diagnostics.Debug.WriteLine("Started ListenOnceAsync");
}
}
}
dataBuffer in above code is the 'chunk' of binary data I've received on my websocket.
const int maxMessageSize = 1024 * 4; // 4 bytes
var dataBuffer = new byte[maxMessageSize];
while (webSocket.State == WebSocketState.Open)
{
var result = await webSocket.ReceiveAsync(new ArraySegment<byte>(dataBuffer), CancellationToken.None);
if (result.MessageType == WebSocketMessageType.Close)
{
Trace.WriteLine($"Received websocket close message: {result.CloseStatus.Value}, {result.CloseStatusDescription}");
await webSocket.CloseAsync(result.CloseStatus.Value, result.CloseStatusDescription, CancellationToken.None);
}
else if (result.MessageType == WebSocketMessageType.Text)
{
var message = Encoding.UTF8.GetString(dataBuffer);
Trace.WriteLine($"Received websocket text message: {message}");
}
else // binary
{
Trace.WriteLine("Received websocket binary message");
ListenOnce(dataBuffer); //calls the above
}
}
But the above code doesn't work. I believe I have couple of issues/questions with this approach -
- I believe I am not correctly chunking the data to Direct Line Speech to ensure that it receives full audio for correct S2T conversion.
- I know DLS API supports ListenOnceAsync() but not sure if this supports ASR (it knows when the speaker on other side stopped talking)
- Can I just get the websocket url for the Direct Line Speech endpoint and assume DLS correctly consumes the direct websocket stream?