I'm using WebRTC
to send video from a server to client browser (using the native WebRTC API
and an MCU WebRTC
server like Kurento).
Before sending it to clients each frame of the video contained metadata (like subtitles or any other applicative content). I'm looking for a way to send this metadata to the client such that it remains synchronized (to the time it is actually presented). In addition I would like to be able to access this data from the client side (by Javascript).
Some options I thought about:
- Sending the data by WebRTC DataChannel. But I don't know how to ensure the data is synchronized on a per-frame basis. But I couldn't find a way to ensure the data sent by the data channel and the video channel is synchronize (again, I hope to get precision level of single frame).
- Sending the data manually to the client in some way (WebRTC DataChannel, websockets, etc.) with timestamps that match the video's timestamps. However, even if Kurento or other middle servers preserve the timestamp information in the video, according to the following answer there is no applicative way to get the video timestamps from the javascript:
How can use the webRTC Javascript API to access the outgoing audio RTP timestamp at the sender and the incoming audio RTP timestamp at the receiver?. I thought about using the standard video element's
timeupdate
event, but I don't konw if it will work for precision level of frame, and I'm not sure what it means in a live video as in WebRTC. - Sending the data manually and attach it to the video applicatively as another
TextTrack
. Then use theonenter
andonexit
to read it synchronizely: http://www.html5rocks.com/en/tutorials/track/basics/. It still requires precise timestamps, and I'm not sure how to know what are the timestamps and if Kurento pass them as-is. - Using the statistics API of WebRTC to manually count frames (using
getstats
), and hope that the information provided by this API is precise.
What is the best way to do that, and how to solve the problems I mentioned in either way?
EDIT: Precise synchronization (in resolution of no more than a single frame ) of metadata with the appropriate frame is required.