Merge videos with different start times and show t

I have multiple video files of a conference call. However, each participant joined the call at a different time, which resulted in the fact that each video file has a different start time offset values.

Video   Start Time
------------------
Video1  00:00
Video2  00:10
Video3  01:40

My purpose is to play back this conference. However, I did not record the conference as 1 video, it is recorded with multiple video files, instead. How do I stitch these videos?

There is also a paid solution to merge video fragments to a single clip – this will make the client-side much simpler. But can I do it for free?

The expected outcome is to have one video showing three videos on a grid. When ffmpeg stitches the videos, it should consider their start time values properly so that the videos are played accordingly.

Use -itsoffset to specify the offset (in s.msec) of the individual streams. The value will be subtracted from / added to the timestamps of the individual streams. Obviously, you have to play around with the offset depending on your input streams.

For example:

ffmpeg \
-itsoffset -1 -i video.mp4 \
-itsoffset -2 -i video.mp4 \
-itsoffset -3 -i video.mp4 \
-filter_complex hstack=inputs=3 \
-c:v libx264 -crf 23 out.mp4

This gives you video streams stacked next to each other using the hstack filter, offset by a second each.

You can also use a complex filterchain to generate a black background color, e.g. with 1280×720 size and a 10 s duration, then overlay the stacked videos, and merge the audio streams:

ffmpeg \
-itsoffset -1 -i video.mp4 \
-itsoffset -2 -i video.mp4 \
-itsoffset -3 -i video.mp4 \
-filter_complex \
"color=c=black:s=1280x720:d=10[black]; \
[0:v][1:v][2:v]hstack=inputs=3[stacked]; \
[0:a][1:a][2:a]amerge=inputs=3[outa]; \
[black][stacked]overlay=(main_w-overlay_w)/2:(main_h-overlay_h)/2[outv]" \
-map "[outv]" -map "[outa]" -c:v libx264 -crf 23 output.mp4