FFmpeg check audio channels for silence

2020-06-25 05:10发布

问题:

I have two .mp4 files, both having 8 (7.1) audio channels. But in fact, I've been told that one has a stereo audio channel + 2 SAP (secondary audio on channels 7-8), and the other one has 6 (5.1) audio channels + 2 SAP (on channels 7-8). So basically the later one has some [real] audio channels such as Center channel where that doesn't exist in the former stereo one (although it has those channels, but apparently they are silent/mute).

I've been trying to see some differentiating metadata to somehow differentiate the two using Mediainfo, but the metadata for both look exactly the same. Also tried some basic metadata retrieval with ffmpeg and ffprobe, again they both look the same - no luck:

ffprobe -i 2ch.mp4 -show_streams -select_streams a:0

So the question is: Does ffmpeg or ffprobe have any quick ways to differentiate those two? Are there any audio filters that can detect if a specific audio channel is silent or not? Or any other differentiating metadata? I would prefer differentiating the two through some metadata than content analysis.

This is a sample of 2-channel mp4 file, and this one is a sample of the 6-channel mp4.

回答1:

Both of your sample files have 4 audio streams or tracks. Each audio track has 2 channels, with a layout of stereo.

Apparently, the audio encoder is constant bit-rate, and so the metadata cannot be used to distinguish silent tracks from sound-bearing ones.

You'll need to parse each suspect audio stream.

ffmpeg -i file -map 0:a:1 -af astats -f null -

At the end of the console log, statistics for the audio stream will be printed,

e.g.

[Parsed_astats_0 @ 0000000003c3aec0] Channel: 1
[Parsed_astats_0 @ 0000000003c3aec0] DC offset: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Min level: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Max level: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Min difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Max difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Mean difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] RMS difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Peak level dB: -6153.053111
[Parsed_astats_0 @ 0000000003c3aec0] RMS level dB: -inf
[Parsed_astats_0 @ 0000000003c3aec0] RMS peak dB: -3076.526556
[Parsed_astats_0 @ 0000000003c3aec0] RMS trough dB: -inf
[Parsed_astats_0 @ 0000000003c3aec0] Crest factor: 1.000000
[Parsed_astats_0 @ 0000000003c3aec0] Flat factor: -inf
[Parsed_astats_0 @ 0000000003c3aec0] Peak count: 662528
[Parsed_astats_0 @ 0000000003c3aec0] Bit depth: 0/0
[Parsed_astats_0 @ 0000000003c3aec0] Dynamic range: -inf
[Parsed_astats_0 @ 0000000003c3aec0] Zero crossings: 0
[Parsed_astats_0 @ 0000000003c3aec0] Zero crossings rate: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Channel: 2
[Parsed_astats_0 @ 0000000003c3aec0] DC offset: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Min level: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Max level: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Min difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Max difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Mean difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] RMS difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Peak level dB: -6153.053111
[Parsed_astats_0 @ 0000000003c3aec0] RMS level dB: -inf
[Parsed_astats_0 @ 0000000003c3aec0] RMS peak dB: -3076.526556
[Parsed_astats_0 @ 0000000003c3aec0] RMS trough dB: -inf
[Parsed_astats_0 @ 0000000003c3aec0] Crest factor: 1.000000
[Parsed_astats_0 @ 0000000003c3aec0] Flat factor: -inf
[Parsed_astats_0 @ 0000000003c3aec0] Peak count: 662528
[Parsed_astats_0 @ 0000000003c3aec0] Bit depth: 0/0
[Parsed_astats_0 @ 0000000003c3aec0] Dynamic range: -inf
[Parsed_astats_0 @ 0000000003c3aec0] Zero crossings: 0
[Parsed_astats_0 @ 0000000003c3aec0] Zero crossings rate: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Overall
[Parsed_astats_0 @ 0000000003c3aec0] DC offset: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Min level: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Max level: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Min difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Max difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Mean difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] RMS difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Peak level dB: -6153.053111
[Parsed_astats_0 @ 0000000003c3aec0] RMS level dB: -inf
[Parsed_astats_0 @ 0000000003c3aec0] RMS peak dB: -3076.526556
[Parsed_astats_0 @ 0000000003c3aec0] RMS trough dB: -inf
[Parsed_astats_0 @ 0000000003c3aec0] Flat factor: -inf
[Parsed_astats_0 @ 0000000003c3aec0] Peak count: 662528.000000
[Parsed_astats_0 @ 0000000003c3aec0] Bit depth: 0/0
[Parsed_astats_0 @ 0000000003c3aec0] Number of samples: 662528

If the RMS level dB is -inf, then that channel is silent.