Concatenate audio with image and video using ffmpe

2019-08-24 04:00发布

I have 1 image, 1 audio file and 1 video. I would like to merge all of them to make a video which will

  • show the image and play audio file for the first 10s
  • play the video file

here is what I was trying to do so far.

ffmpeg \
-loop 1 -framerate 24 -t 10 -i item1.jpg \
-i "https://audio-ssl.itunes.apple.com/apple-assets-us-std-000001/Music/66/58/f7/mzi.eoocfriy.aac.p.m4a" \
-i item4.mp4 \
-filter_complex \
"[0]scale=432:432,setdar=1[img1]; \
 [1]volume=1[aud1]; \
 [2]scale=432:432,setdar=1[vid1]; \ 
 [img1][aud1][vid1] concat=n=3:v=1:a=1" \
outputfile.mp4

I got the error:

[Parsed_setdar_4 @ 0x3063780] Media type mismatch between the 'Parsed_setdar_4' filter output pad 0 (video) and the 'Parsed_concat_6' filter input pad 1 (audio) [AVFilterGraph @ 0x30479a0] Cannot create the link setdar:0 -> concat:1 Error initializing complex filters. Invalid argument

I tried to googled but still cannot figure out what I am doing wrong?

Updated: I ran the following command:

ffmpeg \
-loop 1 -framerate 24 -t 10 -i item1.jpg \
-t 10 -i "https://audio-ssl.itunes.apple.com/apple-assets-us-std-000001/Music/66/58/f7/mzi.eoocfriy.aac.p.m4a" \
-i item4.mp4 \
-f lavfi -t 1 -i anullsrc \
-filter_complex \
"[0]scale=432:432,setsar=1[img1]; \
[2]scale=432:432,setsar=1[vid1]; \ 
[img1][1][vid1][3] concat=n=2:v=1:a=1" \
outputfile.mp4

and got the following error:

ffmpeg version 3.3.3 Copyright (c) 2000-2017 the FFmpeg developers
  built with gcc 4.8 (Ubuntu 4.8.4-2ubuntu1~14.04.3)
  configuration: --extra-libs=-ldl --prefix=/opt/ffmpeg --mandir=/usr/share/man --enable-avresample --disable-debug --enable-nonfree --enable-gpl --enable-version3 --enable-libopencore-amrnb --enable-libopencore-amrwb --disable-decoder=amrnb --disable-decoder=amrwb --enable-libpulse --enable-libfreetype --enable-gnutls --disable-ffserver --enable-libx264 --enable-libx265 --enable-libfdk-aac --enable-libvorbis --enable-libtheora --enable-libmp3lame --enable-libopus --enable-libvpx --enable-libspeex --enable-libass --enable-avisynth --enable-libsoxr --enable-libxvid --enable-libvidstab --enable-libwavpack --enable-nvenc --enable-libzimg
  libavutil      55. 58.100 / 55. 58.100
  libavcodec     57. 89.100 / 57. 89.100
  libavformat    57. 71.100 / 57. 71.100
  libavdevice    57.  6.100 / 57.  6.100
  libavfilter     6. 82.100 /  6. 82.100
  libavresample   3.  5.  0 /  3.  5.  0
  libswscale      4.  6.100 /  4.  6.100
  libswresample   2.  7.100 /  2.  7.100
  libpostproc    54.  5.100 / 54.  5.100
Input #0, image2, from 'item1.jpg':
  Duration: 00:00:00.04, start: 0.000000, bitrate: 8365 kb/s
    Stream #0:0: Video: mjpeg, yuvj420p(pc, bt470bg/unknown/unknown), 432x432 [SAR 1:1 DAR 1:1], 24 fps, 24 tbr, 24 tbn, 24 tbc
Input #1, mov,mp4,m4a,3gp,3g2,mj2, from 'https://audio-ssl.itunes.apple.com/apple-assets-us-std-000001/Music/66/58/f7/mzi.eoocfriy.aac.p.m4a':
  Metadata:
    major_brand     : M4A
    minor_version   : 0
    compatible_brands: M4A mp42isom
    creation_time   : 1983-06-16T23:20:44.000000Z
    iTunSMPB        :  00000000 00000840 00000000 00000000001423C0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  Duration: 00:00:29.98, start: 0.047891, bitrate: 285 kb/s
    Stream #1:0(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 271 kb/s (default)
    Metadata:
      creation_time   : 1983-06-16T23:20:44.000000Z
Input #2, mov,mp4,m4a,3gp,3g2,mj2, from 'item4.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    creation_time   : 1970-01-01T00:00:00.000000Z
    encoder         : Lavf53.24.2
  Duration: 00:00:13.70, start: 0.000000, bitrate: 615 kb/s
    Stream #2:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 320x240 [SAR 1:1 DAR 4:3], 229 kb/s, 15 fps, 15 tbr, 15360 tbn, 30 tbc (default)
    Metadata:
      creation_time   : 1970-01-01T00:00:00.000000Z
      handler_name    : VideoHandler
    Stream #2:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, 5.1, fltp, 382 kb/s (default)
    Metadata:
      creation_time   : 1970-01-01T00:00:00.000000Z
      handler_name    : SoundHandler
Input #3, lavfi, from 'anullsrc':
  Duration: N/A, start: 0.000000, bitrate: 705 kb/s
    Stream #3:0: Audio: pcm_u8, 44100 Hz, stereo, u8, 705 kb/s
[AVFilterGraph @ 0x3955e20] No such filter: ' '
Error initializing complex filters.
Invalid argument

1条回答
beautiful°
2楼-- · 2019-08-24 04:34

When concatting paired streams, for each segment, the concat filter expects a corresponding pair of inputs. So, if you are concatting 1 video and 2 audio streams, each segment input should be [v][a][a].

So, in this case, a dummy audio is required to pair with the 2nd video.

ffmpeg \
-loop 1 -framerate 24 -t 10 -i item1.jpg \
-t 10 -i "https://audio-ssl.itunes.apple.com/apple-assets-us-std-000001/Music/66/58/f7/mzi.eoocfriy.aac.p.m4a" \
-i item4.mp4 \
-f lavfi -t 1 -i anullsrc \
-filter_complex \
"[0]scale=432:432,setsar=1[img1]; \
 [2]scale=432:432,setsar=1[vid1]; \
 [img1][1][vid1][3] concat=n=2:v=1:a=1" \
outputfile.mp4

The anullsrc provides the dummy audio.

The intro audio has to be limited to the image duration, since the concat filter uses the duration of the longer stream in each segment.

Use setsar not setdar since SAR is the actual parameter that is changed and it's possible that after reduction to a rational number, the SARs may not match.

n in concat should be 2 since it specifies the number of paired segments, not total number of inputs.

查看更多
登录 后发表回答