Audio equivalent of SPS and PPS when muxing Annex

2019-08-09 11:07发布

问题:

I'm using the Bento4 library to mux an Annex B TS (MPEG-2 transport stream) file with my h264 video and AAC audio streams that are being generated from VideoToolbox and AVFoundation respectively, as source data for a HLS (HTTP Live Streaming) stream. This question is not necessarily Bento4-specific: I'm trying to understand the underlying concepts so that I can accomplish the task, preferably by using Apple libs.

So far, I've figured out how to create an AP4_AvcSampleDescription by getting various kinds of data out of my CMVideoFormatDescriptionRef, and most importantly by generating an SPS and PPS using index 0 and 1 respectively of CMVideoFormatDescriptionGetH264ParameterSetAtIndex that I can just stick as byte buffers into Bento4. Great, that's all the header information I need so that I can ask Bento4 to mux video into a ts file!

Now I'm trying to mux audio into the same file. I'm using my CMAudioFormatDescriptionRef to get the required information to construct my AP4_MpegAudioSampleDescription, which Bento4 uses to make the necessary QT atoms and headers. However, one if the fields is a "decoder info" byte buffer, with no explanation of what it is, or code to generate one from data. I would have hoped to have a CMAudioFormatDescriptionGetDecoderInfo or something, but I can't find anything like that. Is there such a function in any Apple library? Or is there a nice spec that I haven't found on how to generate this data?

Or alternatively, am I walking down the wrong path? Is there an easier way to mux ts files from a Mac/iOS code base?

回答1:

Muxing audio into an MPEG-TS is surprisingly easy, and does not require a complex header like a video stream does! It only requires a 7-byte ADTS header before each sample buffer, before you write it as a PES.

Bento4 only uses the "DecoderInfo" buffer in order to parse it into an AP4_Mp4AudioDecoderConfig instance, so that it can extract the information needed for the ADTS header. Instead of being so roundabout in acquiring this data, I made a copy-paste of AP4_Mpeg2TsAudioSampleStream::WriteSample that writes a CMSampleBufferRef. It can easily be generalized for other audio frameworks, but I'll just paste it as-is here for reference:

// These two functions are copy-pasted from Ap4Mpeg2Ts.cpp
static unsigned int GetSamplingFrequencyIndex(unsigned int sampling_frequency) { ... }
static void
MakeAdtsHeader(unsigned char *bits,
               size_t  frame_size,
               unsigned int  sampling_frequency_index,
               unsigned int  channel_configuration) { ... }

static const size_t kAdtsHeaderLength = 7;

- (void)appendAudioSampleBuffer2:(CMSampleBufferRef)sampleBuffer
{
    // Get the actual audio data from the block buffer.
    CMBlockBufferRef blockBuffer = CMSampleBufferGetDataBuffer(sampleBuffer);
    size_t blockBufferLength = CMBlockBufferGetDataLength(blockBuffer);

    // Get the audio meta-data from its AudioFormatDescRef
    CMAudioFormatDescriptionRef audioFormat = CMSampleBufferGetFormatDescription(sampleBuffer);
    const AudioStreamBasicDescription *asbd = CMAudioFormatDescriptionGetStreamBasicDescription(audioFormat);

    // These are the values we will need to build our ADTS header
    unsigned int sample_rate = asbd->mSampleRate;
    unsigned int channel_count = asbd->mChannelsPerFrame;
    unsigned int sampling_frequency_index = GetSamplingFrequencyIndex(sample_rate);
    unsigned int channel_configuration = channel_count;

    // Create a byte buffer with first the header, and then the sample data.
    NSMutableData *buffer = [NSMutableData dataWithLength:kAdtsHeaderLength + blockBufferLength];
    MakeAdtsHeader((unsigned char*)[buffer mutableBytes], blockBufferLength, sampling_frequency_index, channel_configuration);
    CMBlockBufferCopyDataBytes(blockBuffer, 0, blockBufferLength, ((char*)[buffer mutableBytes])+kAdtsHeaderLength);

    // Calculate a timestamp int64 that Bento4 can use, by converting our CMTime into an Int64 in the timescale of the audio stream.
    CMTime presentationTime = CMSampleBufferGetPresentationTimeStamp(sampleBuffer);
    AP4_UI64 ts = CMTimeConvertScale(presentationTime, _audioStream->m_TimeScale, kCMTimeRoundingMethod_Default).value;

    _audioStream->WritePES(
        (const unsigned char*)[buffer bytes],
        (unsigned int)[buffer length],
        ts,
        false, // don't need a decode timestamp for audio
        ts,
        true, // do write a presentation timestamp so we can sync a/v
        *_output
    );
}


回答2:

The 'decoder info' byte buffer needed by Bento4 to create a AP4_MpegAudioSampleDescription instance is the codec initialization data, which is codec specific. For AAC-LC audio, it is typically 2 bytes of data (for HE-AAC you would get a few more bytes), the details of which are specified in the AAC spec. For example, a 44.1kHz, stereo, AAC-LC stream will have [0x12,0x10] as init data. In most Apple APIs, this type of codec initialization data is conveyed through what they call 'Magic Cookies'. It is likely that the function CMAudioFormatDescriptionGetMagicCookie will return what you need here.