Why does libmp3lame add zeros to the start of the

2019-07-01 14:26发布

问题:

I have a uncompressed .wav file that I turn into a 96k MP3 file:

ffmpeg.exe -i song.wav -vn -b:a 96000 -ac 2 -ar 48000 -acodec libmp3lame -y song.mp3

The input file has 637386 samples. The output has 639360 samples. The extra samples in the MP3 are all zeros at the beginning of the file. This happens in every file I've translated and with more codecs than just libmp3lame. Is this an FFMPEG bug or a codec bug? Why are these added? Is there a way to stop them from being added?

Edit: Simplified example and console output:

ffmpeg.exe -i song.wav -y song.mp3

ffmpeg version N-55796-gb74213d Copyright (c) 2000-2013 the FFmpeg developers
  built on Aug 26 2013 19:43:51 with gcc 4.7.3 (GCC)
  configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libcaca --enable-libfreetype --enable-libgsm --enable-libilbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libxavs --enable-libxvid --enable-zlib
  libavutil      52. 42.100 / 52. 42.100
  libavcodec     55. 29.100 / 55. 29.100
  libavformat    55. 14.102 / 55. 14.102
  libavdevice    55.  3.100 / 55.  3.100
  libavfilter     3. 82.102 /  3. 82.102
  libswscale      2.  5.100 /  2.  5.100
  libswresample   0. 17.103 /  0. 17.103
  libpostproc    52.  3.100 / 52.  3.100
Guessed Channel Layout for  Input Stream #0.0 : stereo
Input #0, wav, from 'song.wav':
  Duration: 00:00:13.28, bitrate: 1538 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, stereo, s16, 1536 kb/s
Output #0, mp3, to 'song.mp3':
  Metadata:
    TSSE            : Lavf55.14.102
    Stream #0:0: Audio: mp3 (libmp3lame), 48000 Hz, stereo, s16p
Stream mapping:
  Stream #0:0 -> #0:0 (pcm_s16le -> libmp3lame)
Press [q] to stop, [?] for help
size=     208kB time=00:00:13.29 bitrate= 128.4kbits/s
video:0kB audio:208kB subtitle:0 global headers:0kB muxing overhead 0.111205%

Number of samples in wav: 637386

Number of samples in mp3: 639984

回答1:

The amount of delay added by LAME in FFmpeg is

avctx->initial_padding = lame_get_encoder_delay(s->gfp) + 528 + 1;

From the FAQ of the LAME project:

2. Why does LAME add silence to the beginning each song?

DECODER DELAY AT START OF FILE:

All decoders I have tested introduce a delay of 528 samples. That is, after decoding an mp3 file, the output will have 528 samples of 0's appended to the front. This is because the standard MDCT/filterbank routines used by the ISO have a 528 sample delay. It would be possible to write a MDCT/filterbank routine with a 0 sample delay (see description of Takehiro's MDCT/filterbank routine used in LAME encoding below) but I dont know that anyone has done this. Furthermore, because of the overlapped nature of MDCT frames, the first half of the first granule (1 granule=576 samples) doesn't have a previous frame to overlap with, resulting in attenuation of the first N samples. The value of N depends on the window type. For "STOP_TYPE" and "SHORT_TYPE", N=96, while for "START_TYPE" and "NORMAL_TYPE", N=288. The first frame produced by LAME 3.56 and up will always be of STOP_TYPE or SHORT_TYPE.

ENCODER DELAY AT START OF FILE:

ISO based encoders (BladeEnc, 8hz-mp3, etc) use a MDCT/filterbank routine similar to the one used in decoding, and thus also introduce their own 528 sample delay. A .wav file encoded & decoded will have a 1056 sample delay (1056 samples will be appended to the beginning).

The discrepancy as per the FAQ isn't the same as in your output, probably because of technical nuances that I don't know of, but it's not a bug.



标签: ffmpeg