I am trying to write an audio resampler using Android's MediaCodec suite.
I am currently feeding an MP3 stereo audio file into a MediaExtractor which is then decoded by a MediaCodec. The sample rate of the source audio is 48000.
What I don't understand is the first four output buffers I receive from the decoder:
- size 0, time 0
- size 0, time 24000
- size 4312, time 48000
- size 4608, time 72000
- size 4608, time 96000
- etc.
From this answer, this answer, and this article, I believe the first two buffers are merely propagated "encoder delay" and may just be thrown out. However, the third buffer I have listed throws me for a loop.
For buffer #4 (and onward), the math works out:
((4608 bytes) / (2 bytes/sample) / (2 channels))
/ ((48,000 samples/sec) / (1,000,000 us/sec))
= 24,000 us (i.e. the change in time between buffers)
What is going on with buffer #3 though? A straightforward take on the data suggests that the audio begins playing at time 48000 us and then pauses momentarily before the 72000 us mark, at which point it begins to play continuously with no breaks.
It seems more likely that there are 296 hidden 0's before the data of buffer #3, but this offset doesn't seem to be indicated by any variables in my code. Can anyone shed some light on this for me?
As far as I have figured out, audio MediaCodec stuffs* don't really care what the timestamps are associated with each buffer. Instead, it just magically recalculates what the timestamps should be for each piece of data, using the specified bitrate, by assuming there are no holes in the flow of bytes.
As a supporting piece of evidence to this hypothesis, one of the pieces of the solution in this answer simply suggests incrementing timestamp values and not actually calculating the correct timestamps.
So in the example of this question, audio MediaCodec stuffs* would completely ignore all of the timestamp values. Buffer #3 byte #1 would be assumed by MediaCodec to be time 0, and the time for buffer #4 byte #1 would be inferred from the number of bytes processed thus far and not taken as 24000 or 48000.
*namely a MediaCodec object or some related custom component
Note: MediaCodec video encoder does seem to care about timestamps.