I have two videos that I'd like to combine into a single video, in which both videos would sit on top of a static background image. (Think something like this.) My requirements are that the software I use is free, that it runs on OSX, and that I don't have to re-encode my videos an excessive number of times. I'd also like to be able to perform this operation from the command line or via script, since I'll be doing it a lot. (But this isn't strictly necessary.)
I tried fiddling with ffmpeg for a couple of hours, but it just doesn't seem very well suited for post-processing. I could potentially hack something together via the overlay feature, but so far I haven't figured out how to do it, aside from pain-stakingly converting the image to a video (which takes 2x as long as the length of my videos!) and then superimposing the two videos onto it in another rendering step.
Any tips? Thank you!
Update:
Thanks to LordNeckbeard's help, I was able to achieve my desired result with a single ffmpeg call! Unfortunately, encoding is quite slow, taking 6 seconds to encode 1 second of video. I believe this is caused by the background image. Any tips on speeding up encoding? Here's the ffmpeg log:
MacBook-Pro:Video archagon$ ffmpeg -loop 1 -i underlay.png -i test-slide-video-short.flv -i test-speaker-video-short.flv -filter_complex "[1:0]scale=400:-1[a];[2:0]scale=320:-1[b];[0:0][a]overlay=0:0[c];[c][b]overlay=0:0" -shortest -t 5 -an output.mp4
ffmpeg version 1.0 Copyright (c) 2000-2012 the FFmpeg developers
built on Nov 14 2012 16:18:58 with Apple clang version 4.0 (tags/Apple/clang-421.0.60) (based on LLVM 3.1svn)
configuration: --prefix=/opt/local --enable-swscale --enable-avfilter --enable-libmp3lame --enable-libvorbis --enable-libopus --enable-libtheora --enable-libschroedinger --enable-libopenjpeg --enable-libmodplug --enable-libvpx --enable-libspeex --mandir=/opt/local/share/man --enable-shared --enable-pthreads --cc=/usr/bin/clang --arch=x86_64 --enable-yasm --enable-gpl --enable-postproc --enable-libx264 --enable-libxvid
libavutil 51. 73.101 / 51. 73.101
libavcodec 54. 59.100 / 54. 59.100
libavformat 54. 29.104 / 54. 29.104
libavdevice 54. 2.101 / 54. 2.101
libavfilter 3. 17.100 / 3. 17.100
libswscale 2. 1.101 / 2. 1.101
libswresample 0. 15.100 / 0. 15.100
libpostproc 52. 0.100 / 52. 0.100
Input #0, image2, from 'underlay.png':
Duration: 00:00:00.04, start: 0.000000, bitrate: N/A
Stream #0:0: Video: png, rgb24, 1024x768, 25 fps, 25 tbr, 25 tbn, 25 tbc
Input #1, flv, from 'test-slide-video-short.flv':
Metadata:
author :
copyright :
description :
keywords :
rating :
title :
presetname : Custom
videodevice : VGA2USB Pro V3U30343
videokeyframe_frequency: 5
canSeekToEnd : false
createdby : FMS 3.5
creationdate : Mon Aug 16 16:35:34 2010
encoder : Lavf54.29.104
Duration: 00:50:32.75, start: 0.000000, bitrate: 90 kb/s
Stream #1:0: Video: vp6f, yuv420p, 640x480, 153 kb/s, 8 tbr, 1k tbn, 1k tbc
Input #2, flv, from 'test-speaker-video-short.flv':
Metadata:
author :
copyright :
description :
keywords :
rating :
title :
presetname : Custom
videodevice : Microsoft DV Camera and VCR
videokeyframe_frequency: 5
audiodevice : Microsoft DV Camera and VCR
audiochannels : 1
audioinputvolume: 75
canSeekToEnd : false
createdby : FMS 3.5
creationdate : Mon Aug 16 16:35:34 2010
encoder : Lavf54.29.104
Duration: 00:50:38.05, start: 0.000000, bitrate: 238 kb/s
Stream #2:0: Video: vp6f, yuv420p, 320x240, 204 kb/s, 25 tbr, 1k tbn, 1k tbc
Stream #2:1: Audio: mp3, 22050 Hz, mono, s16, 32 kb/s
File 'output.mp4' already exists. Overwrite ? [y/N] y
using cpu capabilities: none!
[libx264 @ 0x7fa84c02f200] profile High, level 3.1
[libx264 @ 0x7fa84c02f200] 264 - core 119 - H.264/MPEG-4 AVC codec - Copyleft 2003-2011 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=3 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'output.mp4':
Metadata:
encoder : Lavf54.29.104
Stream #0:0: Video: h264 ([33][0][0][0] / 0x0021), yuv420p, 1024x768, q=-1--1, 25 tbn, 25 tbc
Stream mapping:
Stream #0:0 (png) -> overlay:main
Stream #1:0 (vp6f) -> scale
Stream #2:0 (vp6f) -> scale
overlay -> Stream #0:0 (libx264)
Press [q] to stop, [?] for help
Update 2:
It works! One important tweak was to move the underlay.png input to the end of the input list. This increased performance substantially. Here's my final ffmpeg call. (The maps at the end aren't required for this particular arrangement, but I sometimes have a few extra audio inputs that I want to map to my output.)
ffmpeg
-i VideoOne.flv
-i VideoTwo.flv
-loop 1 -i Underlay.png
-filter_complex "[2:0] [0:0] overlay=20:main_h/2-overlay_h/2 [overlay];[overlay] [1:0] overlay=main_w-overlay_w-20:main_h/2-overlay_h/2 [output]"
-map [output]:v
-map 0:a
OutputVideo.m4v
Complex filtergraphs in ffmpeg may seem complicated at first, but it makes sense once you try it a few times. You need to be familiar with the filtergraph syntax. Start by reading Filtering Introduction and Filtergraph Description. You do not have to understand it completely but it will help you understand the following example.
Example
Use the
scale
video filter to scale (resize) the inputs to a specific size, and then use theoverlay
video filter to place the videos over the static images.What this means
Non-filtering options:
-loop 1
Continuously loop the next input which isbackground.png
.background.png
The background image. The stream specifier is[0:v]
It is sized 1280x720.video1.mp4
This first video input (Big Buck Bunny in the example image). The stream specifier is[1:v]
. It is sized 640x360.video2.mp4
This second video input (the varmints in the example image). The stream specifier is[2:v]
. It is sized 640x360.Filtering options
-filter_complex
The option to start the complex filtergraph.[1:v]scale=(iw/2)-20:-1[a]
This is takingvideo1.mp4
, referred to as[1:v]
, and scaling it.iw
is an alias for input width, and in this case it is a value of 640. We divide than in half and subtract an additional 20 pixels as padding so there will be space around each video when it is overlaid.-1
means to automatically calculate a value that will preserve aspect. If course you can omit the fanciness and manually provide values such asscale=320:240
. Then use an output link label named[a]
so we can refer to this output later.[2:v]scale=(iw/2)-20:-1[b]
Same as above, but usevideo2.mp4
as the input and name the output link label as[b]
.[0:v][a]overlay=10:(main_h/2)-(overlay_h/2):shortest=1[c]
Usebackground.png
as first overlay input, and use the results of our first scale filter, referred to as[a]
, as the second overlay input. Place[a]
over[0:v]
.main_h
is an alias for main height which refers to the background input ([0:v]
) height.overlay_h
is an alias for overlay height and refers to the height of the foreground ([a]
). This example will place Big Buck Bunny on the left side.shortest=1
will force the output to terminate when the shortest input terminates; otherwise it will loop forever sincebackground.png
is looping. Name the results of this filter[c]
.[c][b]overlay=overlay_w*2:overlay_h:shortest=1[video]
Use[c]
as the first overlay input and[b]
as the second overlay input. Using overlay parametersoverlay_w
andoverlay_h
(overlay input width and height). This example will place the verminy varmints on the right side. Label the output as[video]
.-map "[video]"
map the output from the filter to the output file. The[video]
link label at the end of the filtergraph is not necessarily required but it is recommended to be explicit with mapping.Audio
Have two separate audio streams
By default only the first input audio channel encountered will be used in the output as defined in Stream Selection. You can use the
-map
option to add an additional audio track from the second video input (the output will have two audio streams). This example will stream copy the audio instead of re-encoding:Combine both audio streams
Or combine both audio inputs into one using the
amerge
andpan
audio filters (assuming both inputs are stereo and you want stereo output):Also see