I aim to use the Android MediaCodec for decoding a video stream, then use the output images for further image processing in native code.
Platform: ASUS tf700t android 4.1.1. Test stream: H.264 full HD @ 24 frm/s
With the Tegra-3 SoC inside, I am counting on hardware support for the video decoding. Functionally, my application behaves as expected: I indeed can access the decoder images and process them properly. However, I experience a very high decoder cpu load.
In following experiments, process/thread load is measured by "top -m 32 -t" in adb shell. To get reliable output from "top", all 4 cpu cores are forced active by running a few threads looping forever at lowest priority. This is confirmed by repeatedly executing "cat /sys/devices/system/cpu/cpu[0-3]/online". To keep things simple, there is only video decoding, no audio; and there is no timing control so the decoder runs as fast as it can.
First experiment: run the application, calling the JNI processing function, but all further processing calls are commented-out. Results:
- throughput: 25 frm/s
- 1% load of thread VideoDecoder of the application
- 24% load of thread Binder_3 of process /system/bin/mediaserver
It seems that the decoding speed is CPU limited (25% of a quad-core CPU)... When enabling the output processing, decoded images are correct and the application works. Only problem: too high cpu load for decoding.
After tons of experiments, I considered giving the MediaCodec a surface to draw its result. In all other aspects, the code is identical. Results:
- throughput 55 frm/s (nice!!)
- 2% load of thread VideoDecoder of the application
- 1% load of thread mediaserver of process /system/bin/mediaserver
Indeed, the video is shown on the provided Surface. Since there is hardly any cpu load, this must be hardware accelerated...
It seems that de MediaCodec is only using the hardware accelaration if a Surface is provided?
So far, so good. I was already inclined to use the Surface as a work-around (not required, but in some cases even a nice-to-have). But, in case a surface is provided, I cannot access the output images! Result is an access violation in the native code.
This really puzzles me! I did not see any notion of access limitations, or whatsoever in the documentation http://developer.android.com/reference/android/media/MediaCodec.html. Also nothing in this direction was mentioned at the google I/O presentation http://www.youtube.com/watch?v=RQws6vsoav8.
So: how to use hardware accelarated Android MediaCodec decoder and access images in native code? How to avoid the access violation? Any help is appreceated! Also any explanation or hint.
I am pretty sure the MediaExtractor and MediaCodec are used properly, since the application is functionaly ok (as long as I do not provide a Surface). It is still pretty experimental, and a good API design is on the todo list ;-)
Note that the only difference between the two experiments is variable mSurface: null or an actual Surface in "mDecoder.configure(mediaFormat, mSurface, null, 0);"
Initialization code:
mExtractor = new MediaExtractor();
mExtractor.setDataSource(mPath);
// Locate first video stream
for (int i = 0; i < mExtractor.getTrackCount(); i++) {
mediaFormat = mExtractor.getTrackFormat(i);
String mime = mediaFormat.getString(MediaFormat.KEY_MIME);
Log.i(TAG, String.format("Stream %d/%d %s", i, mExtractor.getTrackCount(), mime));
if (streamId == -1 && mime.startsWith("video/")) {
streamId = i;
}
}
if (streamId == -1) {
Log.e(TAG, "Can't find video info in " + mPath);
return;
}
mExtractor.selectTrack(streamId);
mediaFormat = mExtractor.getTrackFormat(streamId);
mDecoder = MediaCodec.createDecoderByType(mediaFormat.getString(MediaFormat.KEY_MIME));
mDecoder.configure(mediaFormat, mSurface, null, 0);
width = mediaFormat.getInteger(MediaFormat.KEY_WIDTH);
height = mediaFormat.getInteger(MediaFormat.KEY_HEIGHT);
Log.i(TAG, String.format("Image size: %dx%d format: %s", width, height, mediaFormat.toString()));
JniGlue.decoutStart(width, height);
Decoder loop (running in a separate thread):
ByteBuffer[] inputBuffers = mDecoder.getInputBuffers();
ByteBuffer[] outputBuffers = mDecoder.getOutputBuffers();
while (!isEOS && !Thread.interrupted()) {
int inIndex = mDecoder.dequeueInputBuffer(10000);
if (inIndex >= 0) {
// Valid buffer returned
int sampleSize = mExtractor.readSampleData(inputBuffers[inIndex], 0);
if (sampleSize < 0) {
Log.i(TAG, "InputBuffer BUFFER_FLAG_END_OF_STREAM");
mDecoder.queueInputBuffer(inIndex, 0, 0, 0, MediaCodec.BUFFER_FLAG_END_OF_STREAM);
isEOS = true;
} else {
mDecoder.queueInputBuffer(inIndex, 0, sampleSize, mExtractor.getSampleTime(), 0);
mExtractor.advance();
}
}
int outIndex = mDecoder.dequeueOutputBuffer(info, 10000);
if (outIndex >= 0) {
// Valid buffer returned
ByteBuffer buffer = outputBuffers[outIndex];
JniGlue.decoutFrame(buffer, info.offset, info.size);
mDecoder.releaseOutputBuffer(outIndex, true);
} else {
// Some INFO_* value returned
switch (outIndex) {
case MediaCodec.INFO_OUTPUT_BUFFERS_CHANGED:
Log.i(TAG, "RunDecoder: INFO_OUTPUT_BUFFERS_CHANGED");
outputBuffers = mDecoder.getOutputBuffers();
break;
case MediaCodec.INFO_OUTPUT_FORMAT_CHANGED:
Log.i(TAG, "RunDecoder: New format " + mDecoder.getOutputFormat());
break;
case MediaCodec.INFO_TRY_AGAIN_LATER:
// Timeout - simply ignore
break;
default:
// Some other value, simply ignore
break;
}
}
if ((info.flags & MediaCodec.BUFFER_FLAG_END_OF_STREAM) != 0) {
Log.d(TAG, "RunDecoder: OutputBuffer BUFFER_FLAG_END_OF_STREAM");
isEOS = true;
}
}
I use mediacodec api on nexus 4 and get the output color format of QOMX_COLOR_FormatYUV420PackedSemiPlanar64x32Tile2m8ka. I think this format is a kind of hardware format and only can be rendered by hardware rendering. Interestingly, I find that when I use null and actual Surface to configure the surface for MediaCodec, the output buffer length will be change to a actual value and 0 respectively. I don't know why. I think you can do some experiments on different devices for more results. About hardware accelerating you can see http://www.saschahlusiak.de/2012/10/hardware-acceleration-on-sgs2-with-android-4-0/
If you configure an output Surface, the decoded data is written to a graphic buffer that can be used as an OpenGL ES texture (via the "external texture" extension). The various bits of hardware get to hand data around in a format they like, and the CPU doesn't have to copy the data.
If you don't configure a Surface, the output goes into a
java.nio.ByteBuffer
. There's at least one buffer copy to get the data from the MediaCodec-allocated buffer to yourByteByffer
, and presumably another copy to get the data back out into your JNI code. I expect what you're seeing is the overhead cost rather than software decoding cost.You might be able to improve matters by sending the output to a
SurfaceTexture
, rending into an FBO or pbuffer, and then usingglReadPixels
to extract the data. If you read into a "direct"ByteBuffer
or callglReadPixels
from native code, you reduce your JNI overhead. The down side to this approach is that your data will be in RGB rather than YCbCr. (OTOH, if your desired transformations can be expressed in a GLES 2.0 fragment shader, you can get the GPU to do the work instead of the CPU.)As noted in another answer, the decoders on different devices output
ByteBuffer
data in different formats, so interpreting the data in software may not be viable if portability is important to you.Edit: Grafika now has an example of using the GPU to do image processing. You can see a demo video here.