My application needs to do some processing on live camera frames on the CPU, before rendering them on the GPU. There's also some other stuff being rendered on the GPU which is dependent on the results of the CPU processing; therefore it's important to keep everything synchronised so we don't render the frame itself on the GPU until the results of the CPU processing for that frame are also available.
The question is what's the lowest overhead approach for this on android?
The CPU processing in my case just needs a greyscale image, so a YUV format where the Y plane is packed is ideal (and tends to be a good match to the native format of the camera devices too). NV12, NV21 or fully planar YUV would all provide ideal low-overhead access to greyscale, so that would be preferred on the CPU side.
In the original camera API the setPreviewCallbackWithBuffer() was the only sensible way to get data onto the CPU for processing. This had the Y plane separate so was ideal for the CPU processing. Getting this frame available to OpenGL for rendering in a low overhead way was the more challenging aspect. In the end I wrote a NEON color conversion routine to output RGB565 and just use glTexSubImage2d to get this available on the GPU. This was first implemented in the Nexus 1 timeframe, where even a 320x240 glTexSubImage2d call took 50ms of CPU time (poor drivers trying to do texture swizzling I presume - this was significantly improved in a system update later on).
Back in the day I looked into things like eglImage extensions, but they don't seem to be available or well documented enough for user apps. I had a little look into the internal android GraphicsBuffer classes but ideally want to stay in the world of supported public APIs.
The android.hardware.camera2 API had promise with being able to attach both an ImageReader and a SurfaceTexture to a capture session. Unfortunately I can't see any way of ensuring the right sequential pipeline here - holding off calling updateTexImage() until the CPU has processed is easy enough, but if another frame has arrived during that processing then updateTexImage() will skip straight to the latest frame. It also seems with multiple outputs there will be independent copies of the frames in each of the queues that ideally I'd like to avoid.
Ideally this is what I'd like:
- Camera driver fills some memory with the latest frame
- CPU obtains pointer to the data in memory, can read Y data without a copy being made
- CPU processes data and sets a flag in my code when frame is ready
- When beginning to render a frame, check if a new frame is ready
- Call some API to bind the same memory as a GL texture
- When a newer frame is ready, release the buffer holding the previous frame back into the pool
I can't see a way of doing exactly that zero-copy style with public API on android, but what's the closest that it's possible to get?
One crazy thing I tried that seems to work, but is not documented: The ANativeWindow NDK API can accept data NV12 format, even though the appropriate format constant is not one of the ones in the public headers. That allows a SurfaceTexture to be filled with NV12 data by memcpy() to avoid CPU-side colour conversion and any swizzling that happens driver side in glTexImage2d. That is still an extra copy of the data though that feels like it should be unnecessary, and again as it's undocumented might not work on all devices. A supported sequential zero-copy Camera -> ImageReader -> SurfaceTexture or equivalent would be perfect.