When measuring network latency (time ack received - time msg sent) in any protocol over TCP, what timer would you recommend to use and why? What resolution does it have? What are other advantages/disadvantages?
Optional: how does it work?
Optional: what timer would you NOT use and why?
I'm looking mostly for Windows / C++ solutions, but if you'd like to comment on other systems, feel free to do so.
(Currently we use GetTickCount(), but it's not a very accurate timer.)
This is a copy of my answer from: C++ Timer function to provide time in nano seconds
For Linux (and BSD) you want to use clock_gettime().
#include <sys/time.h>
int main()
{
timespec ts;
// clock_gettime(CLOCK_MONOTONIC, &ts); // Works on FreeBSD
clock_gettime(CLOCK_REALTIME, &ts); // Works on Linux
}
For windows you want to use the QueryPerformanceCounter. And here is more on QPC
Apparently there is a known issue with QPC on some chipsets, so you may want to make sure you do not have those chipset. Additionally some dual core AMDs may also cause a problem. See the second post by sebbbi, where he states:
QueryPerformanceCounter() and
QueryPerformanceFrequency() offer a
bit better resolution, but have
different issues. For example in
Windows XP, all AMD Athlon X2 dual
core CPUs return the PC of either of
the cores "randomly" (the PC sometimes
jumps a bit backwards), unless you
specially install AMD dual core driver
package to fix the issue. We haven't
noticed any other dual+ core CPUs
having similar issues (p4 dual, p4 ht,
core2 dual, core2 quad, phenom quad).
You mentioned that you use GetTickCount(), so I'm going to recommend that you take a look at QueryPerformanceCounter().
There is really no substitute for the rdtsc instruction. You cannot be sure of what resolution the QueryPerformanceCounter will support. Some have a very large granularity (low increment rate/frequency), some return nothing at all.
Instead, I recommend you use the rdtsc instruction. It does not require any OS implementation and returns the number of CPU internal clock cycles that have elapsed since the computer/processor/core was powered up. For a 3 GHz processor that's 3 billion increments per second - it doesn't get more precise than that, now does it? This instruction is available for x86-32 and -64 beginning with the Pentium or Pentium MMX. It should therefore be accessible from x86 Linuxes as well.
There are plenty of posts about it here on stackoverflow.com. I've written a few myself ...