I am running this test on a cpu with constant_tsc
and nonstop_tsc
$ grep -m 1 ^flags /proc/cpuinfo | sed 's/ /\n/g' | egrep "constant_tsc|nonstop_tsc"
constant_tsc
nonstop_tsc
Step 1: Calculate the tick rate of the tsc:
I calculate _ticks_per_ns
as the median over a number of observations. I use rdtscp
to ensure in-order execution.
static const int trials = 13;
std::array<double, trials> rates;
for (int i = 0; i < trials; ++i)
{
timespec beg_ts, end_ts;
uint64_t beg_tsc, end_tsc;
clock_gettime(CLOCK_MONOTONIC, &beg_ts);
beg_tsc = rdtscp();
uint64_t elapsed_ns;
do
{
clock_gettime(CLOCK_MONOTONIC, &end_ts);
end_tsc = rdtscp();
elapsed_ns = to_ns(end_ts - beg_ts); // calculates ns between two timespecs
}
while (elapsed_ns < 10 * 1e6); // busy spin for 10ms
rates[i] = (double)(end_tsc - beg_tsc) / (double)elapsed_ns;
}
std::nth_element(rates.begin(), rates.begin() + trials/2, rates.end());
_ticks_per_ns = rates[trials/2];
Step 2: Calculate starting wall clock time and tsc
uint64_t beg, end;
timespec ts;
// loop to ensure we aren't interrupted between the two tsc reads
while (1)
{
beg = rdtscp();
clock_gettime(CLOCK_REALTIME, &ts);
end = rdtscp();
if ((end - beg) <= 2000) // max ticks per clock call
break;
}
_start_tsc = end;
_start_clock_time = to_ns(ts); // converts timespec to ns since epoch
Step 3: Create a function which can return wall clock time from the tsc
uint64_t tsc_to_ns(uint64_t tsc)
{
int64_t diff = tsc - _start_tsc;
return _start_clock_time + (diff / _ticks_per_ns);
}
Step 4: Run in a loop, printing wallclock time from clock_gettime
and from rdtscp
// lock the test to a single core
cpu_set_t mask;
CPU_ZERO(&mask);
CPU_SET(6, &mask);
sched_setaffinity(0, sizeof(cpu_set_t), &mask);
while (1)
{
timespec utc_now;
clock_gettime(CLOCK_REALTIME, &utc_now);
uint64_t utc_ns = to_ns(utc_now);
uint64_t tsc_ns = tsc_to_ns(rdtscp());
uint64_t ns_diff = tsc_ns - utc_ns;
std::cout << "clock_gettime " << ns_to_str(utc_ns) << '\n';
std::cout << "tsc_time " << ns_to_str(tsc_ns) << " diff=" << ns_diff << "ns\n";
sleep(10);
}
Output:
clock_gettime 11:55:34.824419837 tsc_time 11:55:34.824419840 diff=3ns clock_gettime 11:55:44.826260245 tsc_time 11:55:44.826260736 diff=491ns clock_gettime 11:55:54.826516358 tsc_time 11:55:54.826517248 diff=890ns clock_gettime 11:56:04.826683578 tsc_time 11:56:04.826684672 diff=1094ns clock_gettime 11:56:14.826853056 tsc_time 11:56:14.826854656 diff=1600ns clock_gettime 11:56:24.827013478 tsc_time 11:56:24.827015424 diff=1946ns
Questions:
It is quickly evident that the times calculated in these two ways rapidly drift apart.
I'm assuming that with constant_tsc
and nonstop_tsc
that the tsc rate is constant.
Is this the on board clock that is drifting? Surely it doesn't drift at this rate?
What is the cause of this drift?
Is there anything I can do to keep them in sync (other than very frequently recalculating
_start_tsc
and_start_clock_time
in step 2)?
The relationship between the TSC and something like
CLOCK_MONOTONIC
will not be exactly unchanging. Even though you "calibrate" the TSC againstCLOCK_MONOTONIC
, the calibration will be out of date almost as soon as it is finished!The reasons they won't stay in sync long term:
CLOCK_MONOTONIC
is affected by NTP clock rate adjustments. NTP will constantly check network time and subtly slow down or speed up the system clock to match network time. This results in some kind of oscillating pattern in the trueCLOCK_MONOTONIC
frequency, and so your calibration will always be slightly off, especially the next time NTP applies a rate adjustment. You could compare againstCLOCK_MONOTONIC_RAW
to eliminate this effect.CLOCK_MONOTONIC
and TSC are almost certainly based on totally different underlying oscillators. It is often say that modern OSes use the TSC for time-keeping, but this is only to apply a small "local" offset to some other underlying slow-running clock to provide a very precise time (e.g., the "slow time" might be updated every timer tick, and then the TSC is used to interpolate between timer ticks). It is the slow underlying clock (something like the HPET or APIC clocks) that determines the longer-term behavior ofCLOCK_MONOTONIC
. The TSC itself, however is an independent free running clock, deriving its frequency from a different oscillator, on a different place on the chipset/motherboard and will different natural fluctuations (in particular, different response to temperature changes).It is (2) that is more fundamental out of the two above: it means that even without any kind of NTP adjustments (or if you use a clock that is not subject to them), you'll see drift over time if the underlying clocks are based on different physical oscillators.
Is this the on board clock that is drifting? Surely it doesn't drift at this rate?
No, they shouldn't drift
What is the cause of this drift?
NTP service or similar that runs your OS. They affects clock_gettime(CLOCK_REALTIME, ...);
Is there anything I can do to keep them in sync (other than very frequently recalculating _start_tsc and _start_clock_time in step 2)? Yes you can ease the problem.
1 You can try to use CLOCK_MONOTONIC instead of CLOCK_REALTIME.
2 You can calculate the difference as a linear function from the time and apply it to compensate the drifting. But it will not be very reliable because time services doesn't adjust the time as linear function. But it will give you some more accuracy. Periodically you can do readjustment.
Some drifting you can get because you calculate _ticks_per_ns not accurately. You can check it by running you program several times. If results are not reproducible, it is mean that you calculate your _ticks_per_ns incorrectly. It is better to use statistics method then just an average value.
Also please note, _ticks_per_ns you are calculating by using CLOCK_MONOTONIC, which is related to TSC.
Next you are using CLOCK_REALTIME. It provides the system time. If your system has NTP or similar service, the time will be adjusted.
Your difference is around 2 micro seconds per minute. It is 0.002 * 24*60 = 2.9 milli seconds a day. It is a great accuracy for CPU clock. 3 ms a day it is a 1 second a year.
The reason for the drift seen in the OP, at least on my machine, is that the TSC ticks per ns drifts away from its original value of
_ticks_per_ns
. The following results were from this machine:cat /proc/cpuinfo
showsconstant_tsc
andnonstop_tsc
flags.viewRates.cc can be run to see the current TSC Ticks per ns on a machine:
rdtscp.h:
viewRates.cc:
linearExtrapolator.cc can be run to re-create the experiment of the OP:
linearExtrapolator.cc:
Here is output from a run of
viewRates
immediately followed bylinearExtrapolator
:The
viewRates
output shows that the TSC ticks per ns are decreasing fairly rapidly with time corresponding to one of those steep drops in the plot above. ThelinearExtrapolator
output shows, as in the OP, the difference between the elapsed ns as reported byclock_gettime()
, and the elapsed ns obtained by converting the elapsed TSC ticks to elapsed ns using_ticks_per_ns
== 2.8069831264 obtained at start time. Rather than asleep(10);
between each print out ofelapsed ns
,elapsed ticks
,ns_diff
, I re-run the TSC ticks per ns calculation using a 10s window; this prints out the currenttsc ticks per ns
ratio. It can be seen that the trend of decreasing TSC ticks per ns observed from theviewRates
output is continuing throughout the run oflinearExtrapolator
.Dividing an
elapsed ticks
by_ticks_per_ns
and subtracting the correspondingelapsed ns
gives thens_diff
, e.g.: (84211534141 / 2.8069831264) - 30000747550 = -20667. But this is not 0 mainly due the drift in TSC ticks per ns. If we had used a value of 2.80698015186 ticks per ns obtained from the last 10s interval, the result would be: (84211534141 / 2.80698015186) - 30000747550 = 11125. The additional error accumulated during that last 10s interval, -20667 - -10419 = -10248, nearly disappears when the correct TSC ticks per ns value is used for that interval: (84211534141 - 56141027929) / 2.80698015186 - (30000747550 - 20000496849) = 349.If the linearExtrapolator had been run at a time when the TSC ticks per ns had been constant, the accuracy would be limited by how well the (constant)
_ticks_per_ns
had been determined, and then it would pay to take, e.g., a median of several estimates. If the_ticks_per_ns
was off by a fixed 40 parts per billion, a constant drift of about 400ns every 10 seconds would be expected, sons_diff
would grow/shrink by 400 each 10 seconds.genTimeSeriesofRates.cc can be used to generate data for a plot like above: genTimeSeriesofRates.cc: