I have multiple video files of a conference call. However, each participant joined the call at a different time, which resulted in the fact that each video file has a different start time offset values.
Video Start Time
------------------
Video1 00:00
Video2 00:10
Video3 01:40
My purpose is to play back this conference. However, I did not record the conference as 1 video, it is recorded with multiple video files, instead.
How do I stitch these videos?
There is also a paid solution to merge video fragments to a single clip – this will make the client-side much simpler. But can I do it for free?
The expected outcome is to have one video showing three videos on a grid.
When ffmpeg stitches the videos, it should consider their start time values properly so that the videos are played accordingly.
Use -itsoffset
to specify the offset (in s.msec
) of the individual streams. The value will be subtracted from / added to the timestamps of the individual streams. Obviously, you have to play around with the offset depending on your input streams.
For example:
ffmpeg \
-itsoffset -1 -i video.mp4 \
-itsoffset -2 -i video.mp4 \
-itsoffset -3 -i video.mp4 \
-filter_complex hstack=inputs=3 \
-c:v libx264 -crf 23 out.mp4
This gives you video streams stacked next to each other using the hstack
filter, offset by a second each.
You can also use a complex filterchain to generate a black background color, e.g. with 1280×720 size and a 10 s duration, then overlay the stacked videos, and merge the audio streams:
ffmpeg \
-itsoffset -1 -i video.mp4 \
-itsoffset -2 -i video.mp4 \
-itsoffset -3 -i video.mp4 \
-filter_complex \
"color=c=black:s=1280x720:d=10[black]; \
[0:v][1:v][2:v]hstack=inputs=3[stacked]; \
[0:a][1:a][2:a]amerge=inputs=3[outa]; \
[black][stacked]overlay=(main_w-overlay_w)/2:(main_h-overlay_h)/2[outv]" \
-map "[outv]" -map "[outa]" -c:v libx264 -crf 23 output.mp4