FFMPEG API - Recording video and audio - Syncing p

I'm developing an app which is able to record video from a webcam and audio from a microphone. I've been using QT but unfortunately the camera module does not work on windows which led me to use ffmpeg to record the video/audio.

My Camera module is now working well besides a slight problem with syncing. The audio and video sometimes end up out of sync by a small difference (less than 1 second I'd say, although it might be worse with longer recordings).

When I encode the frames I add the PTS in the following way (which I took from the muxing.c example):

For the video frames I increment the PTS one by one (starting at 0).
For the audio frames I increment the PTS by the nb_samples of the audio frame (starting at 0).

I am saving the file at 25 fps and asking for the camera to give me 25 fps (which it can). I am also converting the video frames to the YUV420P format. For the audio frames conversion I need to use a AVAudioFifo because the microfone sends bigger samples than the mp4 stream supports, so I have to split them in chuncks. I used the transcode.c example for this.

I am out of ideas in what I should do to sync the audio and video. Do I need to use a clock or something to correctly sync up both streams?

The full code is too big to post here but should it be necessary I can add it to github for example.

Here is the code for writing a frame:

int FFCapture::writeFrame(const AVRational *time_base, AVStream *stream, AVPacket *pkt) {
    /* rescale output packet timestamp values from codec to stream timebase */
    av_packet_rescale_ts(pkt, *time_base, stream->time_base);
    pkt->stream_index = stream->index;
    /* Write the compressed frame to the media file. */
    return av_interleaved_write_frame(oFormatContext, pkt);
}

Code for getting the elapsed time:

qint64 FFCapture::getElapsedTime(qint64 *previousTime) {
    qint64 newTime = timer.elapsed();
    if(newTime > *previousTime) {
        *previousTime = newTime;
        return newTime;
    }
    return -1;
}

Code for adding the PTS (video and audio stream, respectively):

qint64 time = getElapsedTime(&previousVideoTime);
if(time >= 0) outFrame->pts = time;
//if(time >= 0) outFrame->pts = av_rescale_q(time, outStream.videoStream->codec->time_base, outStream.videoStream->time_base);

qint64 time = getElapsedTime(&previousAudioTime);
if(time >= 0) {
    AVRational aux;
    aux.num = 1;
    aux.den = 1000;
    outFrame->pts = time;
    //outFrame->pts = av_rescale_q(time, aux, outStream.audioStream->time_base);
}

Sounds like you need to give the frames (audio and video) real timestamps. Create a function that returns the elapsed time since you started the capture in milliseconds (an integer). Then set time_base for each stream to {1,1000} and set pts of each frame to the return value of your function. But be careful: you can't have a timestamp that is <= a previous timestamp. So you will need to drop frames if you get several all at once (or write another mechanism for dealing with this situation).

Taken from my longer answer here.

Example using QElapsedTimer here.