Real time Linux: disable local timer interrupts

2020-02-26 11:39发布

问题:

TL;DR : Using Linux kernel real time with NO_HZ_FULL I need to isolate a process in order to have deterministic results but /proc/interrupts tell me there is still local timer interrupts (among other). How to disable it?

Long version :

I want to make sure my program is not being interrupt so I try to use a real time Linux kernel. I'm using the real time version of arch Linux (linux-rt on AUR) and I modified the configuration of the kernel to selection the following options :

CONFIG_NO_HZ_FULL=y
CONFIG_NO_HZ_FULL_ALL=y
CONFIG_RCU_NOCB_CPU=y
CONFIG_RCU_NOCB_CPU_ALL=y

then I reboot my computer to boot on this real time kernel with the folowing options:

nmi_watchdog=0
rcu_nocbs=1
nohz_full=1
isolcpus=1

I also disable the following option in the BIOS :

C state
intel speed step
turbo mode
VTx
VTd
hyperthreading

My CPU (i7-6700 3.40GHz) has 4 cores (8 logical CPU with hyperthreading technology) I can see CPU0, CPU1, CPU2, CPU3 in /proc/interrupts file.

CPU1 is isolated by isolcpus kernel parameter and I want to disable the local timer interrupts on this CPU. I though real-time kernel with CONFIG_NO_HZ_FULL and CPU isolation (isolcpus) was enough to do it and I try to check by running theses command :

cat /proc/interrupts | grep LOC > ~/tmp/log/overload_cpu1
taskset -c 1 ./overload
cat /proc/interrupts | grep LOC >> ~/tmp/log/overload_cpu1

where the overload process is:

***overload.c:***
int main()
{
  for(int i=0;i<100;++i)
    for(int j=0;j<100000000;++j);
}

The file overload_cpu1 contains the result:

LOC:     234328        488      12091      11299   Local timer interrupts
LOC:     239072        651      12215      11323   Local timer interrupts

meanings 651-488 = 163 interrupts from local timer and not 0...

For comparison I do the same experiment but I change the core where my process overload run (I keep watching interrupts on CPU1):

taskset -c 0 :   8 interrupts
taskset -c 1 : 163 interrupts
taskset -c 2 :   7 interrupts
taskset -c 3 :   8 interrupts

One of my question is why there is no 0 interrupts ? why the number of interrupts is bigger when my process run on CPU1 ? (I mean I though NO_HZ_FULL will prevent interrupt if my process was alone : "The CONFIG_NO_HZ_FULL=y Kconfig option causes the kernel to avoid sending scheduling-clock interrupts to CPUs with a single runnable task"(https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt)

Maybe an explaination is there is other process running on CPU1. I checked by using ps command :

CLS CPUID RTPRIO PRI  NI CMD                           PID
TS      1      -  19   0 [cpuhp/1]                      18
FF      1     99 139   - [migration/1]                  20
TS      1      -  19   0 [rcuc/1]                       21
FF      1      1  41   - [ktimersoftd/1]                22
TS      1      -  19   0 [ksoftirqd/1]                  23
TS      1      -  19   0 [kworker/1:0]                  24
TS      1      -  39 -20 [kworker/1:0H]                 25
FF      1      1  41   - [posixcputmr/1]                28
TS      1      -  19   0 [kworker/1:1]                 247
TS      1      -  39 -20 [kworker/1:1H]                501

As you can see, there is threads on the CPU1. Is that possible to disable these processes ? I guess it is because if it is not the case, NO_HZ_FULL will never work right ?

Tasks with class TS doesn't disturb me because they didn't have priority among SCHED_FIFO and I can set this policy to my program. Same things for tasks with class FF and priority less than 99.

However, you can see migration/1 that is in SCHED_FIFO and priority 99. Maybe these process can causes interrupts when they run . This explain the few interrupts when my process in on CPU0, CPU2 and CPU3 (respectively 8,7 and 8 interrupts) but it also mean these processes are not running very often and then doesn't explain why there is many interrupts when my process run on CPU1 (163 interrupts).

I also do the same experiment but with the SCHED_FIFO of my overload process and I get:

taskset -c 0 : 1
taskset -c 1 : 4063
taskset -c 2 : 1
taskset -c 3 : 0

In this configuration there is more interrupts in the case my process use SCHED_FIFO policy on CPU1 and less on other CPU. do you know why ?

回答1:

The thing is that a full-tickless CPU (a.k.a. adaptive-ticks, configured with nohz_full=) still receives some ticks.

Most notably the scheduler requires a timer on an isolated full tickless CPU for updating some state every second or so.

This is a documented limitation (as of 2019):

Some process-handling operations still require the occasional scheduling-clock tick. These operations include calculating CPU load, maintaining sched average, computing CFS entity vruntime, computing avenrun, and carrying out load balancing. They are currently accommodated by scheduling-clock tick every second or so. On-going work will eliminate the need even for these infrequent scheduling-clock ticks.

(source: Documentation/timers/NO_HZ.txt, cf. the LWN article (Nearly) full tickless operation in 3.10 from 2013 for some background)

A more accurate method to measure the local timer interrupts (LOC row in /proc/interrupts) is to use perf. For example:

$ perf stat -a -A -e irq_vectors:local_timer_entry ./my_binary

Where my_binary has threads pinned to the isolated CPUs that non-stop utilize the CPU without invoking syscalls - for - say 2 minutes.

There are other sources of additional local timer ticks (when there is just 1 runnable task).

For example, the collection of VM stats - by default they are collected each seconds. Thus, I can decrease my LOC interrupts by setting a higher value, e.g.:

# sysctl vm.stat_interval=60

Another source are periodic checks if the TSC on the different CPUs doesn't drift - you can disable those with the following kernel option:

tsc=reliable

(Only apply this option if you really know that your TSCs don't drift.)

You might find other sources by recording traces with ftrace (while your test binary is running).

Since it came up in the comments: Yes, the SMI is fully transparent to the kernel. It doesn't show up as NMI. You can only detect an SMI indirectly.