A HR timers precision study case

With this topic I would better discuss HR timers and the real precision issue.

I studied a lot of documentation about them and I got confident them are the best and most reliable solution to the problem of delaying execution inside linux kernel modules, with the lesser cost for the CPU, and the greater timing precision (e.g. some time critical drivers use them too, like this one https://dev.openwrt.org/browser/trunk/target/linux/generic/files/drivers/pwm/gpio-pwm.c?rev=35328 ).

Is it right for you too?

Here is one of the most comprehensive and detailed document I have ever seen on this topic: https://www.landley.net/kdocs/ols/2006/ols2006v1-pages-333-346.pdf .

The HR timers promise to go under the jiffies resolution, but unfortunately on my system I did not get the expected results for delays lower than 6 ms (I will show later more details).

My environment is:

Windows 10 PRO 64 bit / 8Gb RAM / CPU Intel 4 Cores
VMWare Player 12
Virtualized OS Linux Mint 18.1 64 bit
Kernel configuration
- Version: 4.10.0-24-generic
- CONFIG_HIGH_RES_TIMERS=y
- CONFIG_POSIX_TIMERS=y
- CONFIG_NO_HZ_COMMON=y
- CONFIG_NO_HZ_IDLE=y
- CONFIG_NO_HZ=y
- CONFIG_HZ_250=y
- CONFIG_HZ=250
- /sys/devices/system/clocksource/clocksource0/available_clocksource => tsc hpet acpi_pm
- /sys/devices/system/clocksource/clocksource0/current_clocksource => tsc

To do a benchmark I wrote a linux kernel module that I freely published at the url https://bitbucket.org/DareDevilDev/hr-timers-tester/ . In the README file there are the instructions to compile and run it by yourself.

It executes a series of cycles as follow:

10 uS .. 90 uS, increment by 10 uS
100 uS .. 900 uS, increment by 100 uS
1 ms .. 9 ms, increment by 1 ms
10 ms .. 90 ms, increment by 10 ms
100 ms .. 900 ms, increment by 100 ms
and finally 1 s

The timings are measured by the "ktime_get" function and stored in a pre-allocated array, for faster performances, and to avoid unwanted delays inside the hr timer callback.

After collecting data, the module prints out the samplings data table.

For my scenario relevant data are:

   10 uS =      41082 nS
   20 uS =      23955 nS
   30 uS =     478361 nS
   40 uS =      27341 nS
   50 uS =     806875 nS
   60 uS =     139721 nS
   70 uS =     963793 nS
   80 uS =      39475 nS
   90 uS =     175736 nS
  100 uS =    1096272 nS
  200 uS =      10099 nS
  300 uS =     967644 nS
  400 uS =     999006 nS
  500 uS =    1025254 nS
  600 uS =    1125488 nS
  700 uS =     982296 nS
  800 uS =    1011911 nS
  900 uS =     978652 nS
 1000 uS =    1985231 nS
 2000 uS =    1984367 nS
 3000 uS =    2068547 nS
 4000 uS =    5000319 nS
 5000 uS =    4144947 nS
 6000 uS =    6047991 nS <= First expected delay!
 7000 uS =    6835180 nS
 8000 uS =    8057504 nS
 9000 uS =    9218573 nS
10000 uS =   10435313 nS

... and so on ...

As you can see in the above kernel log dump, 6 ms is the first expected delay sample.

I repeated the same test on my C.H.I.P. embedded system ( https://getchip.com/pages/chip ), an ARM based board Raspberry like, running at 1 GHz, and equipped with Ubuntu 14.04 (Kernel 4.4.13, HZ = 200).

In this case I got better results:

  30 =      44666 nS
  40 =      24125 nS
  50 =      49208 nS
  60 =      60208 nS
  70 =      70042 nS
  80 =      78334 nS
  90 =      89708 nS
 100 =     126083 nS
 200 =     184917 nS
 300 =     302917 nS <= First expected delay!
 400 =     395000 nS
 500 =     515333 nS
 600 =     591583 nS
 700 =     697458 nS
 800 =     800875 nS
 900 =     900125 nS
1000 =    1013375 nS

...and so on ...

On that cheaper board good results come since 300 uS.

What is you opinion? Is there a better way to get more precision from HR timers in platform independent way? HR timers are the wrong solution to precise timing (mandatory when we have to write hardware drivers)?

Each contribution would be very appreciated.

Thank you!

Problem solved, it was an issue involved by the virtualization environment.

On an old laptop (HP Single Core 1.9GHz) I got good delays since 60 uS, and on a newer one (Dell Quad Core) I goot good delays below 10 uS!