I'm currently having problems making my audio and video streams stay synced.
These are the AVCodecContexts I'm using:
For Video:
AVCodec* videoCodec = ffmpeg.avcodec_find_encoder(AVCodecID.AV_CODEC_ID_H264)
AVCodecContext* videoCodecContext = ffmpeg.avcodec_alloc_context3(videoCodec);
videoCodecContext->bit_rate = 400000;
videoCodecContext->width = 1280;
videoCodecContext->height = 720;
videoCodecContext->gop_size = 12;
videoCodecContext->max_b_frames = 1;
videoCodecContext->pix_fmt = videoCodec->pix_fmts[0];
videoCodecContext->codec_id = videoCodec->id;
videoCodecContext->codec_type = videoCodec->type;
videoCodecContext->time_base = new AVRational
{
num = 1,
den = 30
};
For Audio:
AVCodec* audioCodec = ffmpeg.avcodec_find_encoder(AVCodecID.AV_CODEC_ID_AAC)
AVCodecContext* audioCodecContext = ffmpeg.avcodec_alloc_context3(audioCodec);
audioCodecContext->bit_rate = 1280000;
audioCodecContext->sample_rate = 48000;
audioCodecContext->channels = 2;
audioCodecContext->channel_layout = ffmpeg.AV_CH_LAYOUT_STEREO;
audioCodecContext->frame_size = 1024;
audioCodecContext->sample_fmt = audioCodec->sample_fmts[0];
audioCodecContext->profile = ffmpeg.FF_PROFILE_AAC_LOW;
audioCodecContext->codec_id = audioCodec->id;
audioCodecContext->codec_type = audioCodec->type;
When writing the video frames, I setup the PTS position as follows:
outputFrame->pts = frameIndex; // The current index of the image frame being written
I then encode the frame using avcodec_encode_video2(). After this, I call the following to setup the time stamps:
ffmpeg.av_packet_rescale_ts(&packet, videoCodecContext->time_base, videoStream->time_base);
This plays perfectly.
However, when I do the same for audio, the video plays in slow motion, plays the audio first and then carry's on with the video afterwards with no sound.
I cannot find an example anywhere of how to set pts/dts positions for video/audio in an MP4 file. Any examples of help would be great!
Also, I'm writing the video frames first, after which (once they are all written) I write the audio. I've updated this question with the adjusted values suggested in the comments.
I've uploaded a test video to show my results here: http://www.filedropper.com/test_124
PS: Check out this article/tutorial on A/V Sync with FFmpeg. It might help you if the below doesn't.
1) Regarding the video & audio timestamps...
Rather than use a current frameIndex
as the timestamp, and then later rescaling them. If possible just skip the rescale.
The alternative would then be to make sure PTS values (in outputFrame->pts
) are created correctly in the first place by using the video's frames-per-second (FPS). To do this...
For each Video frame : outputFrame->pts = (1000 / FPS) * frameIndex;
(For a 30 FPS video, frame 1 has 0 time and by frame 30 the "clock" has reached 1 second.
So 1000 / 30 now gives each video frame a presentation interval of 33.333 msecs. When frameIndex
is 30 we can say 33.333 x 30 = 1000 m.secs (or 1 second, confirming 30 frames for each second).
For each Audio frame : outputFrame->pts = ((1024 / 48000) * 1000) * frameIndex;
(since 48khz AAC frame has a duration of 21.333 m.secs, the timestamp increases by that amount of time. The formula is : (1024 PCM / SampleRate) x 1000 ms/perSec) then multiply by frame index).
2) Regarding the audio settings...
Bit-rate :
audioCodecContext->bit_rate = 64000;
seems odd if your sample_rate
is 48000Hz (and I assume, your bit-depth is 16-bits per sample?).
Try either 96000
or 128000
as lowest starting values.
Frame Size :
int AVCodecContext::frame_size
means "Number of samples per channel in
an audio frame".
Considering the above quote of the Docs, and that MPEG AAC does not do "per channel" (since data for both L/R channels is contained within each frame). The AAC frames each hold 1024 PCM samples.
audioCodecContext->frame_size = 88200;
for size, you could try = 1024;
Profile :
I noticed you've used MAIN
for AAC profile. I'm used to seeing Low Complexity
in videos. I tried a few random MP4 filess from various sources on my HDD and I cannot find one using "Main" profile. As a last resort, testing "Low Complexity" won't hurt.
Try using audioCodecContext->profile = ffmpeg.FF_PROFILE_AAC_LOW;
PS: Check this for a possible AAC issue (depending on your FFmpeg version).
Solved the problem. I've added a new function to set video/audio positions after setting the frames PTS positions.
Video is just the usual increment (+1 for each frame), whereas audio is done as follows:
outputFrame->pts = ffmpeg.av_rescale_q(m_audioFrameSampleIncrement, new AVRational { num = 1, den = 48000 }, m_audioCodecContext->time_base);
m_audioFrameSampleIncrement += outputFrame->nb_samples;
After the frame is encoded, I call my new function:
private static void SetPacketProperties(ref AVPacket packet, AVCodecContext* codecContext, AVStream* stream)
{
packet.pts = ffmpeg.av_rescale_q_rnd(packet.pts, codecContext->time_base, stream->time_base, AVRounding.AV_ROUND_NEAR_INF | AVRounding.AV_ROUND_PASS_MINMAX);
packet.dts = ffmpeg.av_rescale_q_rnd(packet.dts, codecContext->time_base, stream->time_base, AVRounding.AV_ROUND_NEAR_INF | AVRounding.AV_ROUND_PASS_MINMAX);
packet.duration = (int)ffmpeg.av_rescale_q(packet.duration, codecContext->time_base, stream->time_base);
packet.stream_index = stream->index;
}