I have a project where we are using gstreamer , x264, etc, to multicast a video stream over a local network to multiple receivers (dedicated computers attached to monitors). We're using gstreamer on both the video source (camera) systems and the display monitors.
We're using RTP, payload 96, and libx264 to encode the video stream (no audio).
But now I need to quantify the latency between (as close as possible to) frame acquisition and display.
Does anyone have suggestions that use the existing software?
Ideally I'd like to be able to run the testing software for a few hours to generate enough statistics to quantify the system. Meaning that I can't do one-off tests like point the source camera at the receiving display monitor displaying a high resolution and manually calculate the difference...
I do realise that using a pure software-only solution, I will not be able to quantify the video acquisition delay (i.e. CCD to framebuffer).
I can arrange that the system clocks on the source and display systems are synchronised to a high accuracy (using PTP), so I will be able to trust the system clocks (else I will use some software to track the difference between the system clocks and remove this from the test results).
In case it helps, the project applications are written in C++, so I can use C event callbacks, if they're available, to consider embedding system time in a custom header (e.g. frame xyz, encoded at time TTT - and use the same information on the receiver to calculate a difference).
I have done this before by writing a simple application that renders sequential numbers (say mod 60) and displays them to the screen. Then you can point your camera at the monitor, and have one of your client machines render that stream to a second monitor. Take a picture with you phone and look a the two numbers to compute your latency.
The latency-clock project has been brought to my attention, and I think it provides a much better solution!
It embeds a binary representation of the current time into the image buffer, and extracts that binary image on decode.
Obviously the system clocks must be synchronised!
I have a solution to this:
I wrote a gstreamer filter plugin (based on the plugin templates) that saves the system time when a frame is captured (and makes a mark on the video buffer) before passing it on the H.264 encoder and network transport.
On the receiving side, I locate the mark (which provides me with a 1 of 20 index) and again note the system time.
I hope it will be a relatively simple exercise to then correlate indices and compare system times. As long as the two system's clocks are reasonably in sync (or have a known difference), I should be able to calculate the difference (which is the latency).
The filter->source
is set differently on the sender and the receiver, to determine the filter's timing behaviour.
/* chain function
* this function does the actual processing
*/
static GstFlowReturn
gst_my_filter_chain (GstPad * pad, GstBuffer * buf)
{
GstMyFilter *filter;
struct timeval nowTimeval;
guint8* data;
int i,j,offset;
filter = GST_MYFILTER (GST_OBJECT_PARENT (pad));
if (filter->startTime == 0){
filter->startTime = GST_BUFFER_TIMESTAMP(buf);
gettimeofday(&filter->startTimeval, NULL);
filter->startTimeUL = (filter->startTimeval.tv_sec*1e6 + filter->startTimeval.tv_usec)/1e3; // in milliseconds?
filter->index = 0;
GstCaps* caps;
gint width, height;
const GstStructure *str;
caps = GST_BUFFER_CAPS(buf);
str = gst_caps_get_structure (caps, 0);
if (!gst_structure_get_int (str, "width", &width) ||
!gst_structure_get_int (str, "height", &height)) {
g_print ("No width/height available\n");
} else {
g_print ("The video size of this set of capabilities is %dx%d\n",
width, height);
filter->width=width;
filter->height=height;
}
}
gettimeofday(&nowTimeval, NULL);
unsigned long timeNow = (nowTimeval.tv_sec*1e6 + nowTimeval.tv_usec)/1e3; // in milliseconds?
if (filter->silent == FALSE){
fprintf(filter->ofp, "%20lu,",
timeNow);
}
data = GST_BUFFER_DATA(buf);
if (filter->source){
offset = filter->index % 20;
for (i = 0; i < 10; i++){
for (j = 0; j < 10; j++){
data[(i+20)*filter->width+(j+offset*10)*1]=23;
}
}
fprintf(filter->ofp, " %u", offset);
} else {
unsigned long avg;
unsigned int min=(unsigned int)(-1UL);
unsigned int minpos=0;
int k=0;
for (k=0; k < 20; k++){
avg=0;
i=5; // in the middle of the box row
for (j = 0; j < 10; j++){
avg += data[(i+20)*filter->width+(j+k*10)*1];
}
if (avg < min){
min = avg;
minpos=k;
}
}
fprintf(filter->ofp, " %u", minpos);
}
fprintf(filter->ofp, "\n");
filter->index++;
/* just push out the incoming buffer without touching it */
return gst_pad_push (filter->srcpad, buf);
}
Usage is as follows:
Sender / server:
GST_DEBUG="*:2" gst-launch-0.10 -v --gst-plugin-path=../../src/.libs videotestsrc num-buffers=100 ! myfilter src=1 ! x264enc tune=zerolatency,speed-preset=fast ! rtph264pay ! udpsink port=3000 host=127.0.0.1
Receiver / client:
GST_DEBUG="*:2" gst-launch-0.10 -v --gst-plugin-path=../../src/.libs udpsrc port=3000 ! "application/x-rtp, media=(string)video, encoding-name=(string)H264, payload=(int)96" ! gstrtpjitterbuffer do-lost=true ! rtph264depay ! ffdec_h264 ! myfilter src=0 ! ffmpegcolorspace ! ximagesink
Obviously, in the testing implementation I am not going to be using localhost (127.0.0.1)!!
I use the --gst-plugin-path
because I have not installed my timing filter.
The project requires a latency as small as possible - ideally 100ms or less. Now with some numbers, I can start fine tuning required parameters to minimize the latency.