I'm currently having problems making my audio and video streams stay synced.
These are the AVCodecContexts I'm using:
For Video:
AVCodec* videoCodec = ffmpeg.avcodec_find_encoder(AVCodecID.AV_CODEC_ID_H264)
AVCodecContext* videoCodecContext = ffmpeg.avcodec_alloc_context3(videoCodec);
videoCodecContext->bit_rate = 400000;
videoCodecContext->width = 1280;
videoCodecContext->height = 720;
videoCodecContext->gop_size = 12;
videoCodecContext->max_b_frames = 1;
videoCodecContext->pix_fmt = videoCodec->pix_fmts[0];
videoCodecContext->codec_id = videoCodec->id;
videoCodecContext->codec_type = videoCodec->type;
videoCodecContext->time_base = new AVRational
{
num = 1,
den = 30
};
For Audio:
AVCodec* audioCodec = ffmpeg.avcodec_find_encoder(AVCodecID.AV_CODEC_ID_AAC)
AVCodecContext* audioCodecContext = ffmpeg.avcodec_alloc_context3(audioCodec);
audioCodecContext->bit_rate = 1280000;
audioCodecContext->sample_rate = 48000;
audioCodecContext->channels = 2;
audioCodecContext->channel_layout = ffmpeg.AV_CH_LAYOUT_STEREO;
audioCodecContext->frame_size = 1024;
audioCodecContext->sample_fmt = audioCodec->sample_fmts[0];
audioCodecContext->profile = ffmpeg.FF_PROFILE_AAC_LOW;
audioCodecContext->codec_id = audioCodec->id;
audioCodecContext->codec_type = audioCodec->type;
When writing the video frames, I setup the PTS position as follows:
outputFrame->pts = frameIndex; // The current index of the image frame being written
I then encode the frame using avcodec_encode_video2(). After this, I call the following to setup the time stamps:
ffmpeg.av_packet_rescale_ts(&packet, videoCodecContext->time_base, videoStream->time_base);
This plays perfectly.
However, when I do the same for audio, the video plays in slow motion, plays the audio first and then carry's on with the video afterwards with no sound.
I cannot find an example anywhere of how to set pts/dts positions for video/audio in an MP4 file. Any examples of help would be great!
Also, I'm writing the video frames first, after which (once they are all written) I write the audio. I've updated this question with the adjusted values suggested in the comments.
I've uploaded a test video to show my results here: http://www.filedropper.com/test_124
PS: Check out this article/tutorial on A/V Sync with FFmpeg. It might help you if the below doesn't.
1) Regarding the video & audio timestamps...
Rather than use a current
frameIndex
as the timestamp, and then later rescaling them. If possible just skip the rescale.The alternative would then be to make sure PTS values (in
outputFrame->pts
) are created correctly in the first place by using the video's frames-per-second (FPS). To do this...For each Video frame :
outputFrame->pts = (1000 / FPS) * frameIndex;
(For a 30 FPS video, frame 1 has 0 time and by frame 30 the "clock" has reached 1 second.
So 1000 / 30 now gives each video frame a presentation interval of 33.333 msecs. When
frameIndex
is 30 we can say 33.333 x 30 = 1000 m.secs (or 1 second, confirming 30 frames for each second).For each Audio frame :
outputFrame->pts = ((1024 / 48000) * 1000) * frameIndex;
(since 48khz AAC frame has a duration of 21.333 m.secs, the timestamp increases by that amount of time. The formula is : (1024 PCM / SampleRate) x 1000 ms/perSec) then multiply by frame index).
2) Regarding the audio settings...
Bit-rate :
audioCodecContext->bit_rate = 64000;
seems odd if yoursample_rate
is 48000Hz (and I assume, your bit-depth is 16-bits per sample?).Try either
96000
or128000
as lowest starting values.Frame Size :
Considering the above quote of the Docs, and that MPEG AAC does not do "per channel" (since data for both L/R channels is contained within each frame). The AAC frames each hold 1024 PCM samples.
audioCodecContext->frame_size = 88200;
for size, you could try= 1024;
Profile :
I noticed you've used
MAIN
for AAC profile. I'm used to seeingLow Complexity
in videos. I tried a few random MP4 filess from various sources on my HDD and I cannot find one using "Main" profile. As a last resort, testing "Low Complexity" won't hurt.Try using
audioCodecContext->profile = ffmpeg.FF_PROFILE_AAC_LOW;
PS: Check this for a possible AAC issue (depending on your FFmpeg version).
Solved the problem. I've added a new function to set video/audio positions after setting the frames PTS positions.
Video is just the usual increment (+1 for each frame), whereas audio is done as follows:
After the frame is encoded, I call my new function: