std::chrono
offer several clocks to measure times. At the same time, I guess the only way a cpu can evaluate time, is by counting cycles.
Question 1: Does a cpu or a gpu has any other way to evaluate time than by counting cycles?
If that is the case, because the way a computer count cycles will never be as precise as an atomic clock, it means that a "second" (period = std::ratio<1>
) for a computer can be actually shorter or bigger than an actual second, causing differences in the long run for time measurements between the computer clock and let's say GPS.
Question 2: Is that correct?
Some hardware have varying frequencies (for example idle mode, and turbo modes). In that case, it would mean that the number of cycles would vary during a second.
Question 3: Is the "cycle count" measured by cpu and gpus varying depending on the hardware frequency? If yes, then how std::chrono
deal with it? If not, what does a cycle correspond to (like what is the "fundamental" time)? Is there a way to access the conversion at compile-time? Is there a way to access the conversion at runtime?
Counting cycles, yes, but cycles of what?
On a modern x86, the timesource used by the kernel (internally and for
clock_gettime
and other system calls) is typically a fixed-frequency counter that counts "reference cycles" regardless of turbo, power-saving, or clock-stopped idle. (This is the counter you get fromrdtsc
, or__rdtsc()
in C/C++).Normal
std::chrono
implementations will use an OS-provided function likeclock_gettime
on Unix. (On Linux, this can run purely in user-space, code + scale factor data in a VDSO page mapped by the kernel into every process's address space. Low-overhead timesources are nice. Avoiding a user->kernel->user round trip helps a lot with Meltdown + Spectre mitigation enabled.)Profiling a tight loop that's not memory bound might want to use actual core clock cycles, so it will be insensitive to the actual speed of the current core. (And doesn't have to worry about ramping up the CPU to max turbo, etc.) e.g. using
perf stat ./a.out
orperf record ./a.out
. e.g. Can x86's MOV really be "free"? Why can't I reproduce this at all?Some systems didn't / don't have a wall-clock-equivalent counter built right in to the CPU, so either the OS would maintain a time in RAM that it updates on timer interrupts, or time-query functions would read the time from a separate chip.
(System call + hardware I/O = higher overhead, which is part of the reason that x86's
rdtsc
instruction morphed from a profiling thing into a clocksource thing.)All of these clock frequencies are ultimately derived from a crystal oscillator on the mobo. But the scale factors to extrapolate time from cycle counts can be adjusted to keep the clock in sync with atomic time, typically using the Network Time Protocol (NTP), as @Tony points out.
Different hardware may provide different facilities. For example, x86 PCs have employed several hardware facilities for timing: for the last decade or so x86 CPUs have Time Stamp Counters operating at their processing frequency or - more recently - some fixed frequency (a "constant rate" aka "invariant" TSC); there may be a High Precision Event Timer, and going back further there were Programmable Interrupt Timers (https://en.wikipedia.org/wiki/Programmable_interval_timer).
Yes, a computer without an atomic clock (they're now available on a chip) isn't going to be as accurate as an atomic clock. That said, services such as Network Time Protocol allow you to maintain tighter coherence across a bunch of computers.
That depends. For TSC, newer "constant rate" TSC implementations don't vary, others do vary.
I'd expect most implementations to call an OS provided time service, as the OS tends to have best knowledge of and access to the hardware. There are a lot of factors that need to be considered - e.g. whether the TSC readings are in sync across cores, what happens if the PC goes into some kind of sleep mode, what manner of memory fences are desirable around the TSC sampling....
For Intel CPUs, see this answer.
std::chrono::duration::count
exposes raw tick counts for whatever time source was used, and you canduraction_cast
to other units of time (e.g. seconds). C++20 is expected to introduce further facilities likeclock_cast
. AFAIK, there's noconstexpr
conversion available: seems dubious too if a program might end up running on a machine with a different TSC rate than the machine it was compiled on.