How to reduce latency in MediaCodec video/avc deco

2019-01-09 13:57发布

I performed some simple timing of MoviePlayer.java in the Grafika MediaCodec sample code running on a Nexus 5. I put a log statement at these locations:

At line 203 just before

decoder.queueInputBuffer

At line 244 after

decoder.dequeueOutputBuffer

I correlated the log statements using presentationTimeUs.

Here is an excerpt from logcat:

01-29 10:56:43.295: I/Grafika(21286): queueInputBuffer index/pts, 2,0
01-29 10:56:43.305: I/Grafika(21286): queueInputBuffer index/pts, 0,33100
01-29 10:56:43.315: I/Grafika(21286): queueInputBuffer index/pts, 3,66466
01-29 10:56:43.325: I/Grafika(21286): queueInputBuffer index/pts, 1,99833
01-29 10:56:43.325: I/Grafika(21286): queueInputBuffer index/pts, 2,133200
01-29 10:56:43.335: I/Grafika(21286): queueInputBuffer index/pts, 0,166566
01-29 10:56:43.345: I/ATSParser(21286): discontinuity on stream pid 0x1011
01-29 10:56:43.345: I/ATSParser(21286): discontinuity on stream pid 0x1100
01-29 10:56:43.345: I/Grafika(21286): queueInputBuffer index/pts, 3,199933
01-29 10:56:43.345: I/Grafika(21286): dequeueOutputBuffer index/pts, 7,0
01-29 10:56:43.345: I/Grafika(21286): queueInputBuffer index/pts, 1,300033
01-29 10:56:43.355: I/Grafika(21286): dequeueOutputBuffer index/pts, 6,33100
01-29 10:56:43.385: I/Grafika(21286): queueInputBuffer index/pts, 2,333400
01-29 10:56:43.385: I/Grafika(21286): dequeueOutputBuffer index/pts, 5,66466
01-29 10:56:43.415: I/Grafika(21286): queueInputBuffer index/pts, 0,366766
01-29 10:56:43.415: I/Grafika(21286): dequeueOutputBuffer index/pts, 4,99833
01-29 10:56:43.445: I/Grafika(21286): queueInputBuffer index/pts, 3,400133
01-29 10:56:43.445: I/Grafika(21286): dequeueOutputBuffer index/pts, 3,133200

I found the time difference from when the first input buffer is queued to when the corresponding output buffer is dequeued is 50 millis. This seems like a lot of time for HW accelerated decoding.

Is there a way to decrease this latency?

1条回答
做个烂人
2楼-- · 2019-01-09 14:36

I think you're seeing some effects unique to the first frame. I repeated your experiment, with the further addition of forcing doRender = false around line 244 to avoid the sleep calls used to manage the output frame rate. I see:

01-29 14:05:36.552  9115  9224 I Grafika : queueInputBuffer index/pts, 2,0
01-29 14:05:36.562  9115  9224 I Grafika : queueInputBuffer index/pts, 0,66655
01-29 14:05:36.572  9115  9224 I Grafika : queueInputBuffer index/pts, 3,133288
01-29 14:05:36.582  9115  9224 I Grafika : queueInputBuffer index/pts, 1,199955

01-29 14:05:36.602  9115  9224 I Grafika : dequeueOutputBuffer index/pts, 4,0
01-29 14:05:36.602  9115  9224 I Grafika : dequeueOutputBuffer index/pts, 3,66655
01-29 14:05:36.602  9115  9224 I Grafika : dequeueOutputBuffer index/pts, 2,133288
01-29 14:05:36.612  9115  9224 I Grafika : dequeueOutputBuffer index/pts, 4,199955

(Extraneous lines removed for clarity.) This confirms your results. Note that, while there was a 50ms lag between input and output for pts=0, the subsequent output frames were available almost instantly. The video I used was "camera-test.mp4" (720p camera output).

For insight into why this is happening, take a look at the other stuff in the log, and where it appears. Starting from the first queueInputBuffer log line, count the number of logs that appear between that and the first dequeueOutputBuffer line. I count about 60 lines of output from OMX-VDEC-1080P on mine. Now count how many OMX-VDEC lines appear after output buffers start appearing. I see none until the video ends.

The video decoder is clearly deferring some expensive initialization until data is available. So the next question is... how much data does it need? I added a 500ms sleep after submitting the second frame (pts==66633). The result: two frames submitted, 500ms pause, two frames submitted, big pile of OMX-VDEC logs. So it seems that the decoder wants several frames before it will start.

This suggests that we can reduce the start-up latency by feeding the first few frames in quickly. To test that, I changed TIMEOUT_USEC to zero, so it'll respond quickly but burn CPU. New log output (your logs, no sleep, no rendering):

01-29 14:29:04.542 10560 10599 I Grafika : queueInputBuffer index/pts, 0,0
01-29 14:29:04.542 10560 10599 I Grafika : queueInputBuffer index/pts, 2,66633
01-29 14:29:04.542 10560 10599 I Grafika : queueInputBuffer index/pts, 3,133288
...
01-29 14:29:04.572 10560 10599 I Grafika : dequeueOutputBuffer index/pts, 4,0
01-29 14:29:04.572 10560 10599 I Grafika : dequeueOutputBuffer index/pts, 3,66633
01-29 14:29:04.572 10560 10599 I Grafika : dequeueOutputBuffer index/pts, 2,133288

By feeding the initial frames in quickly, we've reduced the initial latency from 50ms to 30ms.

(Notice how all the timestamps end in '2'? The timer used for the log times appears to be rounding to the nearest 10ms, so the actual time delta may be slightly different.)

The reason we're feeding the initial frames slowly is that we're trying to drain output from the decoder after each input buffer is submitted, waiting 10ms for output that never appears. My initial thought is that we want to want to wait for a timeout on either dequeueInputBuffer() or dequeueOutputBuffer(), but not both -- maybe use a timeout on input and a quick poll for output at first, then switch to a timeout on output when we run out of input to feed. (For that matter, the initial timeout for input could be -1, since we know nothing is going to happen until the first input buffer is queued.)

I don't know if there is a way to reduce the latency further.

查看更多
登录 后发表回答