With this topic I would better discuss HR timers and the real precision issue.
I studied a lot of documentation about them and I got confident them are the best and most reliable solution to the problem of delaying execution inside linux kernel modules, with the lesser cost for the CPU, and the greater timing precision (e.g. some time critical drivers use them too, like this one https://dev.openwrt.org/browser/trunk/target/linux/generic/files/drivers/pwm/gpio-pwm.c?rev=35328 ).
Is it right for you too?
Here is one of the most comprehensive and detailed document I have ever seen on this topic: https://www.landley.net/kdocs/ols/2006/ols2006v1-pages-333-346.pdf .
The HR timers promise to go under the jiffies resolution, but unfortunately on my system I did not get the expected results for delays lower than 6 ms (I will show later more details).
My environment is:
- Windows 10 PRO 64 bit / 8Gb RAM / CPU Intel 4 Cores
- VMWare Player 12
Virtualized OS Linux Mint 18.1 64 bit
Kernel configuration
- Version: 4.10.0-24-generic
- CONFIG_HIGH_RES_TIMERS=y
- CONFIG_POSIX_TIMERS=y
- CONFIG_NO_HZ_COMMON=y
- CONFIG_NO_HZ_IDLE=y
- CONFIG_NO_HZ=y
- CONFIG_HZ_250=y
CONFIG_HZ=250
/sys/devices/system/clocksource/clocksource0/available_clocksource => tsc hpet acpi_pm
/sys/devices/system/clocksource/clocksource0/current_clocksource => tsc
To do a benchmark I wrote a linux kernel module that I freely published at the url https://bitbucket.org/DareDevilDev/hr-timers-tester/ . In the README file there are the instructions to compile and run it by yourself.
It executes a series of cycles as follow:
- 10 uS .. 90 uS, increment by 10 uS
- 100 uS .. 900 uS, increment by 100 uS
- 1 ms .. 9 ms, increment by 1 ms
- 10 ms .. 90 ms, increment by 10 ms
- 100 ms .. 900 ms, increment by 100 ms
- and finally 1 s
The timings are measured by the "ktime_get" function and stored in a pre-allocated array, for faster performances, and to avoid unwanted delays inside the hr timer callback.
After collecting data, the module prints out the samplings data table.
For my scenario relevant data are:
10 uS = 41082 nS
20 uS = 23955 nS
30 uS = 478361 nS
40 uS = 27341 nS
50 uS = 806875 nS
60 uS = 139721 nS
70 uS = 963793 nS
80 uS = 39475 nS
90 uS = 175736 nS
100 uS = 1096272 nS
200 uS = 10099 nS
300 uS = 967644 nS
400 uS = 999006 nS
500 uS = 1025254 nS
600 uS = 1125488 nS
700 uS = 982296 nS
800 uS = 1011911 nS
900 uS = 978652 nS
1000 uS = 1985231 nS
2000 uS = 1984367 nS
3000 uS = 2068547 nS
4000 uS = 5000319 nS
5000 uS = 4144947 nS
6000 uS = 6047991 nS <= First expected delay!
7000 uS = 6835180 nS
8000 uS = 8057504 nS
9000 uS = 9218573 nS
10000 uS = 10435313 nS
... and so on ...
As you can see in the above kernel log dump, 6 ms is the first expected delay sample.
I repeated the same test on my C.H.I.P. embedded system ( https://getchip.com/pages/chip ), an ARM based board Raspberry like, running at 1 GHz, and equipped with Ubuntu 14.04 (Kernel 4.4.13, HZ = 200).
In this case I got better results:
30 = 44666 nS
40 = 24125 nS
50 = 49208 nS
60 = 60208 nS
70 = 70042 nS
80 = 78334 nS
90 = 89708 nS
100 = 126083 nS
200 = 184917 nS
300 = 302917 nS <= First expected delay!
400 = 395000 nS
500 = 515333 nS
600 = 591583 nS
700 = 697458 nS
800 = 800875 nS
900 = 900125 nS
1000 = 1013375 nS
...and so on ...
On that cheaper board good results come since 300 uS.
What is you opinion? Is there a better way to get more precision from HR timers in platform independent way? HR timers are the wrong solution to precise timing (mandatory when we have to write hardware drivers)?
Each contribution would be very appreciated.
Thank you!
Problem solved, it was an issue involved by the virtualization environment.
On an old laptop (HP Single Core 1.9GHz) I got good delays since 60 uS, and on a newer one (Dell Quad Core) I goot good delays below 10 uS!