FFMPEG Seeking brings audio artifacts

2019-04-28 15:11发布

问题:

I'm implementing a audio decoder using ffmpeg. While reading audio and even seeking already works, I can't figure out a way to clear the buffers after seeking so I have no artifacts when the app starts reading audio right after seeking.

avcodec_flush_buffers doesn´t seem to have any effect on the internal buffers. This issue happens with all decoders (mp3, aac, wma, ...) but PCM/WAV (which doesn´t use internal buffers to hold data to decode since the audio is not compressed).

The code snippet is simple:

av_seek_frame(audioFilePack->avContext, audioFilePack->stream, posInTimeFrame, AVSEEK_FLAG_ANY);
avcodec_flush_buffers(audioFilePack->avContext->streams[audioFilePack->stream]->codec);

Explaining:

audioFilePack->avContext = FormatContext
audioFilePack->stream = Stream Position (also used to read audio packets)
audioFilePack->avContext->streams[audioFilePack->stream]->codec = CodecContext for the codec used

Any ideas on what I should do so I can seek and get no residual audio? Thanks!

回答1:

It's a bug in ffmpeg. The internal buffers aren't being flushed, and therefore when you go to get a packet/frame after flushing, you're getting the pre-seek data. It appears to be fixed as of 3-16-12, so you could incorporate this fix yourself, or upgrade ffmpeg.

http://permalink.gmane.org/gmane.comp.video.libav.devel/23455

As an update, the bug above is indeed a problem, but there's a second bug with AAC specifically.

As of five months ago, another user found this bug, and it was reported to be fixed. https://ffmpeg.org/trac/ffmpeg/ticket/420

The fix was a flush function being added to aacdec.c which clears its internal buffers. The problem is there are two decoders defined in aacdec.c, and only one was given the flush function pointer. If you use the other (more common) decoder, it still won't be cleared properly.

If you're in a position to build ffmpeg yourself, the fix is to add .flush = flush, to the bottom of the definition of AVCodec ff_aac_decoder (which is at the bottom of the file.)

I'll let the ffmpeg guys know so hopefully it can be included in the main branch.



回答2:

I've never written an audio player with seek capability, but what I suspect is going on is this. Each packet of audio decodes into a snippet of the original sound wave. Normally, these snippets sequentially abut each other and the result is a continuous wave, which one hears as audio with no artifacts. When you seek, you force two snippets from disparate parts of the file to abut each other. This generally introduces a discontinuity into the resulting sound wave, which the ear perceives as a click or pop, or as you call it (I am guessing) an artifact.

Here's a more concrete example. Let's suppose that you have played the first 25 packets of audio before you seek. Let's say packet 25 decodes into a wave whose last sample is 12345. While packet 25 is being rendered to the speaker, you seek to packet 66. Let's say packet 66's first sample is -23456. Thus the digital audio stream jumps from 12345 to -23456 across the seek. This is a huge discontinuity, and will be heard as a pop.

I think one solution is to grab one extra packet before you begin to seek (packet 26 in my example), decode it to on offline buffer, apply a fade-out, and then put it into the playback queue. After you seek to your desired location, take the first packet (66 in my eaxmple), decode it to another offline buffer, apply a fade-in, and then put that into the playback queue. This should ensure smooth sound waves and artifact-free seeking.

If you are clever, you can make the fade-out and fade-in as short or long as you want. I think only a few milliseconds ought to be enough to prevent artifacts. You could even apply a cross-fade from the old and new packets. It might also be sufficient to merely note the last sample value in the last packet before the seek, and gradually step it down to zero over a few samples, rather than pulling it to zero immediately. This might be easier than decoding an extra packet.

This is my guess about how this problem could be addressed. This is clearly a solved problem, so I encourage you to also look at open-source audio players and see how they implement seeking. Programs like Audacity, Totem, Banshee, RhythmBox, Amarok, or VLC, or frameworks like GStreamer might be good examples to learn from. If you find they employ notable techniques, please report on theme here. I think people will want to learn what they are. Good luck!