Intel MSR frequency scaling per - thread

2020-07-10 07:08发布

问题:

I'm extending the Linux kernel in order to control the frequency of some threads: when they are scheduled onto a core (any core!), the core's frequency is changed by writing the proper p-state to the register IA32_PERF_CTL, as suggested in Intel's manual. But when different threads with different "custom" frequencies are scheduled, it appears that the throughput of all the thread increases, as if all the cores run at the maximum set frequency.

I did many trials and measurements in different conditions of load and configuration, but the result is the same. After some trials with CPUFreq (with no running app, I set different frequencies on each core, and finally the measured frequencies, with cpufreq-info -w, were equal), I wonder if the CPU cores can really run at different, independent frequencies, or if there are hardware policies or constraints.

Finally, is there a CPU model which makes this fine-grained frequency scaling feasible?

The CPU I am using is Intel Core i5 750

回答1:

You cannot control individual core frequencies for active cores. You can, however, control frequencies of all active cores to be the same. The reasons are in the previous answers - all cores are on the same active voltage plane. Hopefully, the next-gen Haswell processors will make it possible to control each core separately.



回答2:

I think you're missing a big piece of the picture!

Read up on power and clocks domains. All processor cores within a domain run at the same P-state (i.e., the same frequency and voltage). The P-state that all cores will run at in that domain will always be the P-state of the core requesting the highest P-state in that domain. The MSRs don't reflect this at all, nor do the interfaces that the kernel exposes.

Anandtech has a good article on this: http://www.anandtech.com/show/2658/2

"This is all very similar to AMD's Phenom, but where the two differ is in how they handle power management. While AMD will allow individual cores to request different clock speeds, Nehalem attempts to run all of its cores at the same frequency; if one core is idle then it's simply power gated and the core is effectively turned off."

I haven't hooked a power-meter up to SB/IB, but my guess is that the behavior is the same.



回答3:

cpufreq-info will display information about which cores need to be synchronous in their P-states:

[root@navi ~]# cpufreq-info
cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009
Report errors and bugs to cpufreq@vger.kernel.org, please.
analyzing CPU 0:
  driver: acpi-cpufreq
  CPUs which run at the same hardware frequency: 0 1 <---- THIS
  CPUs which need to have their frequency coordinated by software: 0 <--- and THIS
  maximum transition latency: 10.0 us.

At least because of that, I'd recommend going through cpufreq interfaces instead of directly setting registers, as well as making it possible to run on non-intel CPUs which might have uncommon requirements.

Also check on how to make kernel threads stick to specific core, to avoid unexpecteded switching, if you didn't do so already.



回答4:

I want to thank everyone for the contribution! Further investigating, I found other details I share with the community.

As suggested, Nehalem places all the cores in a single clock domain, so that the maximum frequency set among all the cores is applied to all of them; some tools may show different frequencies on idle cores, but it is sufficient to run any application to make the frequency raise to the maximum. This, from my tests, also applies to Sandy Bridge, where cores and LLC slices all reside in the same frequency/voltage domain. I assume that this behavior also happens with Ivy Bridge, as it is only a 'tick' iteration. Instead, I believe that Haswell places cores and LLC slices in different, singular domains, thus enabling per-core frequencies. This is also advertized in several pages like http://www.anandtech.com/show/8423/intel-xeon-e5-version-3-up-to-18-haswell-ep-cores-/4