AudioGraph throws XAUDIO2_E_INVALID_CALL on second

2019-09-14 20:14发布

问题:

I'm attempting to use the AudioGraph API of UWP to reproduce a mix of synthesised speech and short notification sounds ("earcons").

UWP has a speech synthesis API which gives me a stream containing a WAV file, but I don't want to make too many assumptions about the parameters (bit rate, sample depth, etc.) so the idea is to have an AudioSubmixNode and add AudioFrameInputNodes whenever there's some speech to reproduce. There's some complexity around queueing up separate utterances so that they don't overlap.

The graph is initialised as

    private async Task InitAudioGraph()
    {
        var graphCreated = await AudioGraph.CreateAsync(new AudioGraphSettings(Windows.Media.Render.AudioRenderCategory.Speech)
        {
            QuantumSizeSelectionMode = QuantumSizeSelectionMode.LowestLatency
        });
        if (graphCreated.Status != AudioGraphCreationStatus.Success) return;

        _Graph = graphCreated.Graph;
        var outputCreated = await _Graph.CreateDeviceOutputNodeAsync();
        if (outputCreated.Status != AudioDeviceNodeCreationStatus.Success) return;

        _Mixer = _Graph.CreateSubmixNode();
        _Mixer.AddOutgoingConnection(outputCreated.DeviceOutputNode);

        _Graph.Start();
    }

and then the current utterance is played with

class SpeechStreamPlayer : IDisposable
{
    internal static void Play(AudioGraph graph, AudioSubmixNode mixer, SpeechSynthesisStream speechStream)
    {
        if (!speechStream.ContentType.Equals("audio/wav", StringComparison.OrdinalIgnoreCase)) throw new NotSupportedException("Content type: " + speechStream.ContentType);

        var stream = speechStream.AsStreamForRead();

        // Read the RIFF header
        uint chunkId = stream.ReadUint(); // "RIFF" - but in little-endian
        if (chunkId != 0x46464952) throw new NotSupportedException("Magic: " + chunkId);
        uint chunkSize = stream.ReadUint(); // Length of rest of stream
        uint format = stream.ReadUint(); // "WAVE"
        if (format != 0x45564157) throw new NotSupportedException("Stream format: " + format);

        // "fmt " sub-chunk
        uint subchunkId = stream.ReadUint();
        if (subchunkId != 0x20746d66) throw new NotSupportedException("Expected fmt sub-chunk, found " + subchunkId);
        uint subchunkSize = stream.ReadUint();
        uint subchunk2Off = (uint)stream.Position + subchunkSize;
        uint audioFormat = (uint)stream.ReadShort();
        uint chans = (uint)stream.ReadShort();
        uint sampleRate = stream.ReadUint();
        uint byteRate = stream.ReadUint();
        uint blockSize = (uint)stream.ReadShort();
        uint bitsPerSample = (uint)stream.ReadShort();

        // Possibly extra stuff added, so...
        stream.Seek(subchunk2Off, SeekOrigin.Begin);

        subchunkId = stream.ReadUint(); // "data"
        if (subchunkId != 0x61746164) throw new NotSupportedException("Expected data sub-chunk, found " + subchunkId);
        subchunkSize = stream.ReadUint();

        // Ok, the stream is in the correct place to start extracting data and we have the parameters.
        var props = AudioEncodingProperties.CreatePcm(sampleRate, chans, bitsPerSample);

        var frameInputNode = graph.CreateFrameInputNode(props);
        frameInputNode.AddOutgoingConnection(mixer);

        new SpeechStreamPlayer(frameInputNode, mixer, stream, blockSize);
    }

    internal event EventHandler StreamFinished;

    private SpeechStreamPlayer(AudioFrameInputNode frameInputNode, AudioSubmixNode mixer, Stream stream, uint sampleSize)
    {
        _FrameInputNode = frameInputNode;
        _Mixer = mixer;
        _Stream = stream;
        _SampleSize = sampleSize;

        _FrameInputNode.QuantumStarted += Source_QuantumStarted;
        _FrameInputNode.Start();
    }

    private AudioFrameInputNode _FrameInputNode;
    private AudioSubmixNode _Mixer;
    private Stream _Stream;
    private readonly uint _SampleSize;

    private unsafe void Source_QuantumStarted(AudioFrameInputNode sender, FrameInputNodeQuantumStartedEventArgs args)
    {
        if (args.RequiredSamples <= 0) return;
        System.Diagnostics.Debug.WriteLine("Requested {0} samples", args.RequiredSamples);

        var frame = new AudioFrame((uint)args.RequiredSamples * _SampleSize);
        using (var buffer = frame.LockBuffer(AudioBufferAccessMode.Write))
        {
            using (var reference = buffer.CreateReference())
            {
                byte* pBuffer;
                uint capacityBytes;

                var directBuffer = reference as IMemoryBufferByteAccess;
                ((IMemoryBufferByteAccess)reference).GetBuffer(out pBuffer, out capacityBytes);

                uint bytesRemaining = (uint)_Stream.Length - (uint)_Stream.Position;
                uint bytesToCopy = Math.Min(capacityBytes, bytesRemaining);

                for (uint i = 0; i < bytesToCopy; i++) pBuffer[i] = (byte)_Stream.ReadByte();
                for (uint i = bytesToCopy; i < capacityBytes; i++) pBuffer[i] = 0;

                if (bytesRemaining <= capacityBytes)
                {
                    Dispose();
                    StreamFinished?.Invoke(this, EventArgs.Empty);
                }
            }
        }

        sender.AddFrame(frame);
    }

    public void Dispose()
    {
        if (_FrameInputNode != null)
        {
            _FrameInputNode.QuantumStarted -= Source_QuantumStarted;
            _FrameInputNode.Dispose();
            _FrameInputNode = null;
        }

        if (_Stream != null)
        {
            _Stream.Dispose();
            _Stream = null;
        }
    }
}

This works once. When the first utterance finishes, the StreamFinished?.Invoke(this, EventArgs.Empty); notifies the queue management system that the next utterance should be played, and the line

    var frameInputNode = graph.CreateFrameInputNode(props);

throws an Exception with message Exception from HRESULT: 0x88960001. A bit of digging shows that it corresponds to XAUDIO2_E_INVALID_CALL, but that's not very descriptive.

In both cases the parameters passed to AudioEncodingProperties.CreatePcm are (22050, 1, 16).

How could I find out more detail about what went wrong? In the worst case I suppose I could throw the whole graph away and build a new one each time, but that seems rather inefficient.

回答1:

The problem seems to be in

When the first utterance finishes, the StreamFinished?.Invoke(this, EventArgs.Empty); notifies the queue management system that the next utterance should be played

Although the documentation for AudioFrameInputNode.QuantumStarted doesn't say anything about forbidden actions, the docs for AudioGraph.QuantumStarted say

The QuantumStarted event is synchronous, which means that you can't update the properties or state of the AudioGraph or the individual audio nodes in the handler for this event. Attempting perform an operation such as stopping the audio graph or adding, removing, or starting an individual audio node will result in an exception being thrown.

It appears that this applies also to the node's QuantumStarted event.

The simple solution is to move the graph manipulation to another thread with

                        Task.Run(() => StreamFinished?.Invoke(this, EventArgs.Empty));