Synchronization has always fascinated me, or to be precise: why a .ts can be viewed in sync by media players, while the demuxed audio+video reassembled is out of sync.
So I'm trying to understand this, and what can be done to prevent it.
I've read the following: https://trac.handbrake.fr/wiki/LibHandBrakeSync and the source of sync.c (also available on the wiki)
BitStreamTools have written a Theory 101 on the subject also (but I can't link as I'm a new user, sorry)
While I thought my understanding of PCR/PTS was (conceptually) right, I'm having a hard time following handbrake's excellent A/V sync paper.
My question is this: is there a somewhat intuitive (it can be brief, short or longer, as long) explanation of a/v synchronization? While I know that one can recalculate PTS from PCR if audio or video pts is corrupted (discontinuity?), handbrake does not seem to rely on this, but on it's internal PTS. 0, += 1/fps (~=5), 10, 15, ....
Would it be possible to recalculate the pts offsets and correct the .ts (binary) by fixing all audio and video PTS values (and skewing all DTS with the same offset, so the player doesn't "run out of frames", so to speak), and thus have a .ts which can be demuxed, and the isolated tracks then be in sync (if put back together)?
EDIT: Or would it not be possible to fix by using PCR to recalculate all PTS values in a given .ts? While I understand that some frames/audio might be damaged in broadcast so it can not be presented correctly, I'll leave the handling of this (such as removing the video if it's damaged and has corresponding audio part, inserting x ms silence if the audio package is damaged etc.) to later, and for the sake of discussion I'll presume all frames are intact. (But then the PTS values would always be correct though, or what?)
Appendix: My take on the handbrake A/V paper is this: At "expected" 100, the offset is calculated as video pts (100) - audio pts (0) - the internal PTS, to bring the audio up to the same presentation time, thus giving a pts offset of 99. at 105 the offset would be 105-5 = 100, not 99, but we proceed to use 99 as offset since there's no need to recalculate (100-99 = 1. 1/fps < 100ms). At 150, the pts offset is calculated again as the video pts is decreasing, as opposed to increasing...
I'm almost positive I'm complete wrong about this, but can someone point me in the right direction, please?
- Josh