I am trying to apply effects to the frames of a video using the GPU and then to re-encode those frames into a new result video.
In the interest of performance I have implemented the following flow:
There are 3 different threads, each with it's own OpenGL context. These contexts are set up in such a way that they share textures between them.
Thread 1 extracts frames from the video and holds them in the GPU memory as textures, similar to this example.
Thread 2 processes the textures using a modified version of GPUImage that also outputs textures in the GPU memory.
Finally, thread 3 writes the textures obtained from thread 2 into a new video file similar to the method described here
Frame order is maintained using queues between threads 1 and 2, and threads 2 and 3. Textures are deleted from memory manually after they are used for processing / writing.
The whole point of this flow is to separate each process in hopes that the final performance will be that of the slowest of the 3 threads.
THE PROBLEM:
The final video is 90% black frames, only some of them being correct.
I have checked the individual results of extraction, and processing and they work as expected. Also note that the 3 components described in the 3 threads work just fine together in a single thread.
I have tried to synchronise thread 1 and thread 3, and after adding an extra 100ms sleep time to thread 1 the video turns out just fine, with maybe 1 or 2 black frames. Seems to me like the two instance of the decoder and encoder are unable to work simultaneously.
I will edit this post with any extra requested details.
Sharing textures between OpenGL ES contexts requires some care. The way it's implemented in Grafika's "show + capture camera" Activity is broken; see this issue for details. The basic problem is that you essentially need to issue memory barriers when the texture is updated; in practical terms that means issuing glFinish()
on the producer side and, and re-binding the texture on the consumer side, and doing all of this in synchronized
blocks.
Your life will be simpler (and more efficient) if you can do all of the GLES work on a single thread. In my experience, having more than one GLES context active at a time is unwise, and you'll save yourself some pain by finding an alternative.
You probably want something more like this:
- Thread #1 reads the file and feeds frames into a MediaCodec decoder. The decoder sends the output to a SurfaceTexture Surface.
- Thread #2 has the GLES context. It created the SurfaceTexture that thread #1 is sending the output to. It processes the images and renders the output on the Surface of a MediaCodec encoder.
- Thread #3, which created the MediaCodec encoder, sits waiting for the encoded output. As output is received it's written to disk. Note that the use of MediaMuxer can stall; see this blog post for more.
In all cases, the only communication between threads (and, under the hood, processes) is done through Surface. The SurfaceTexture and MediaCodec instances are created and used from a single thread; only the producer endpoint (the Surface) is passed around.
One potential trouble point is with flow control -- SurfaceTextures will drop frames if you feed them too quickly. Combining threads #1 and #2 might make sense depending on circumstances.