I'd like to write a simple linux CLI application that can take 2 video sources (1 of a presenter talking and 1 with their slides and no audio) and merge them.
I'd like the entire output video to be the two original videos, side by side. Failing that, my second best option would be a "picture in picture" style video, with the presenter in a small frame in the corner.
From a few hours research, GStreamer looks like it might be able to do this. Can anyone confirm it before I spend more time trying it out?
If it can't, are there other APIs out there that I might be able to use?
Here is a simple (working) setup using gst-launch (install the gstreamer-tools package on Ubuntu/Debian):
gst-launch v4l2src device=/dev/video1 ! videoscale ! ffmpegcolorspace ! video/x-raw-yuv, width=640, height=480 ! videobox border-alpha=0 left=-640 ! videomixer name=mix ! ffmpegcolorspace ! xvimagesink v4l2src ! videoscale ! ffmpegcolorspace ! video/x-raw-yuv, width=640, height=480 ! videobox right=-640 ! mix.
This basically reads two videos using video 4 linux 2, one from the default device and another stream from /dev/video1. You might want to change that if your setup is different.
The first part (non-bold) is responsible for reading the video from the capture device, negotiating a size and colorspace (videoscale ! ffmpegcolorspace), forcing a specific video format (video/x-raw-yuv, width=640, height=480), adding 640 transparent pixels to the left (thereby moving the picture to the right) and creating a videomixer with the name "mix". Finally it auto-negotiates the colorspace again and displays the result using a XVideo window.
The second part (in bold) reads the second video stream (from the default capture device, add device=/dev/videoX to choose a different device), then does the same colorspace, size negotiation and video format selection as for the first stream, then moves the video 640 pixels to the left and feeds the result to the element named mix (our video mixer). The dot at the end is required and instructs gstreamer to search for an existing element named "mix" instead of looking for a filter.
You could replace v4l2src device=/dev/video1 with filesrc location=video.avi ! decodebin to get the input from a video file.
Replace xvimagesink with jpegenc ! avimux ! filesink location=out.avi to write the result to a video file.
It turns out gstreamer can merge two videos, placing them side by side into an output video using the videomixer filter.
A basic pipeline that takes two input files, scales them to be the same size, then merges them and encodes them into a theora video might look like this:
filesrc -> decodebin -> ffmpegcolourspace -> videoscale -> videobox -> videorate
\
filesrc -> decodebin -> ffmpegcolourspace -> videoscale -> videorate -> videomixer -> ffmpegcolourspace -> theoraenc -> oggmux -> filesink
How you implement this pipeline depends on the language. I prototyped with the Ruby bindings, and it works really well.
AviSynth comes to my mind. I’ve used it many years ago under Windows and it’s pretty good at arbitrary post-processing. AviSynth v3 is supposed to run natively under Linux but is still far from ready. There are tools to run previous version with Wine, though.
MEncoder can do that natively on linux. You can fork their code, or invoke the binary.