I have two videos that I'd like to combine into a single video, in which both videos would sit on top of a static background image. (Think something like this.) My requirements are that the software I use is free, that it runs on OSX, and that I don't have to re-encode my videos an excessive number of times. I'd also like to be able to perform this operation from the command line or via script, since I'll be doing it a lot. (But this isn't strictly necessary.)
I tried fiddling with ffmpeg for a couple of hours, but it just doesn't seem very well suited for post-processing. I could potentially hack something together via the overlay feature, but so far I haven't figured out how to do it, aside from pain-stakingly converting the image to a video (which takes 2x as long as the length of my videos!) and then superimposing the two videos onto it in another rendering step.
Any tips? Thank you!
Update:
Thanks to LordNeckbeard's help, I was able to achieve my desired result with a single ffmpeg call! Unfortunately, encoding is quite slow, taking 6 seconds to encode 1 second of video. I believe this is caused by the background image. Any tips on speeding up encoding? Here's the ffmpeg log:
MacBook-Pro:Video archagon$ ffmpeg -loop 1 -i underlay.png -i test-slide-video-short.flv -i test-speaker-video-short.flv -filter_complex "[1:0]scale=400:-1[a];[2:0]scale=320:-1[b];[0:0][a]overlay=0:0[c];[c][b]overlay=0:0" -shortest -t 5 -an output.mp4
ffmpeg version 1.0 Copyright (c) 2000-2012 the FFmpeg developers
built on Nov 14 2012 16:18:58 with Apple clang version 4.0 (tags/Apple/clang-421.0.60) (based on LLVM 3.1svn)
configuration: --prefix=/opt/local --enable-swscale --enable-avfilter --enable-libmp3lame --enable-libvorbis --enable-libopus --enable-libtheora --enable-libschroedinger --enable-libopenjpeg --enable-libmodplug --enable-libvpx --enable-libspeex --mandir=/opt/local/share/man --enable-shared --enable-pthreads --cc=/usr/bin/clang --arch=x86_64 --enable-yasm --enable-gpl --enable-postproc --enable-libx264 --enable-libxvid
libavutil 51. 73.101 / 51. 73.101
libavcodec 54. 59.100 / 54. 59.100
libavformat 54. 29.104 / 54. 29.104
libavdevice 54. 2.101 / 54. 2.101
libavfilter 3. 17.100 / 3. 17.100
libswscale 2. 1.101 / 2. 1.101
libswresample 0. 15.100 / 0. 15.100
libpostproc 52. 0.100 / 52. 0.100
Input #0, image2, from 'underlay.png':
Duration: 00:00:00.04, start: 0.000000, bitrate: N/A
Stream #0:0: Video: png, rgb24, 1024x768, 25 fps, 25 tbr, 25 tbn, 25 tbc
Input #1, flv, from 'test-slide-video-short.flv':
Metadata:
author :
copyright :
description :
keywords :
rating :
title :
presetname : Custom
videodevice : VGA2USB Pro V3U30343
videokeyframe_frequency: 5
canSeekToEnd : false
createdby : FMS 3.5
creationdate : Mon Aug 16 16:35:34 2010
encoder : Lavf54.29.104
Duration: 00:50:32.75, start: 0.000000, bitrate: 90 kb/s
Stream #1:0: Video: vp6f, yuv420p, 640x480, 153 kb/s, 8 tbr, 1k tbn, 1k tbc
Input #2, flv, from 'test-speaker-video-short.flv':
Metadata:
author :
copyright :
description :
keywords :
rating :
title :
presetname : Custom
videodevice : Microsoft DV Camera and VCR
videokeyframe_frequency: 5
audiodevice : Microsoft DV Camera and VCR
audiochannels : 1
audioinputvolume: 75
canSeekToEnd : false
createdby : FMS 3.5
creationdate : Mon Aug 16 16:35:34 2010
encoder : Lavf54.29.104
Duration: 00:50:38.05, start: 0.000000, bitrate: 238 kb/s
Stream #2:0: Video: vp6f, yuv420p, 320x240, 204 kb/s, 25 tbr, 1k tbn, 1k tbc
Stream #2:1: Audio: mp3, 22050 Hz, mono, s16, 32 kb/s
File 'output.mp4' already exists. Overwrite ? [y/N] y
using cpu capabilities: none!
[libx264 @ 0x7fa84c02f200] profile High, level 3.1
[libx264 @ 0x7fa84c02f200] 264 - core 119 - H.264/MPEG-4 AVC codec - Copyleft 2003-2011 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=3 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'output.mp4':
Metadata:
encoder : Lavf54.29.104
Stream #0:0: Video: h264 ([33][0][0][0] / 0x0021), yuv420p, 1024x768, q=-1--1, 25 tbn, 25 tbc
Stream mapping:
Stream #0:0 (png) -> overlay:main
Stream #1:0 (vp6f) -> scale
Stream #2:0 (vp6f) -> scale
overlay -> Stream #0:0 (libx264)
Press [q] to stop, [?] for help
Update 2:
It works! One important tweak was to move the underlay.png input to the end of the input list. This increased performance substantially. Here's my final ffmpeg call. (The maps at the end aren't required for this particular arrangement, but I sometimes have a few extra audio inputs that I want to map to my output.)
ffmpeg
-i VideoOne.flv
-i VideoTwo.flv
-loop 1 -i Underlay.png
-filter_complex "[2:0] [0:0] overlay=20:main_h/2-overlay_h/2 [overlay];[overlay] [1:0] overlay=main_w-overlay_w-20:main_h/2-overlay_h/2 [output]"
-map [output]:v
-map 0:a
OutputVideo.m4v