I have a server with 2 NUMA node with 16 CPUs each. I can see all the 32 CPUs in task manager, first 16 (NUMA node 1) in the first 2 rows and the next 16 (NUMA node 2) in the last 2 rows.
In my app I am starting 64 threads, using Thread.Start()
. When I run the app, it's CPU intensive, only the first 16 CPUs are busy, the other 16 CPUs are idle.
Why? I am using Interlocked.Increment()
a lot. Could this be a reason?
Is there a way I can start threads on a specific NUMA node?
In addition to gcserver
we should enable GCCpuGroup
and Thread_UseAllCpuGroups
so the config should be more like:
<configuration
<runtime>
<gcServer enabled="true"/>
<GCCpuGroup enabled="true"/>
<Thread_UseAllCpuGroups enabled="true"/>
</runtime>
</configuration>
GcCpuGroup
enables Garbage Collection for multiple CPU groups and Thread_UseAllCpuGroups
enables manage thread distribution across all CPU groups for the runtime.
First thing to check would be indeed the app.config
making sure the necessary options are set:
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<runtime>
<gcServer enabled="true" />
<Thread_UseAllCpuGroups enabled="true" />
<GCCpuGroup enabled="true" />
</runtime>
<startup>
<!-- 4.5 and later should work, use the one targeted -->
<supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.6.2"/>
</startup>
</configuration>
If app.config
-Wizadry isn't helping, is likely that your machine uses multiple kernel groups (Kgroups) when it shouldn't. You can then check your BIOS for NUMA Group Size Optimization
if you have Gen9 HP. If it is in Clustered
mode, the current CLR (2017, .net 4.6.2) only utilizes the first one. If you have no more than 64 cores in that machine, you should be able select the Flat
layout which puts all cores in the same group. If you cannot find it, you may need a BIOS Update.
For lot more details see Unable to use more than one processor group for my threads in a C# app here on StackOverflow. It even comes with its own diagnostics tool.
Have you set the garbage collector to the server version?
In app.config, try:
<configuration
<runtime>
<gcServer enabled="true"/>
</runtime>
</configuration>
Because of the way the heaps are allocated the server GC makes a massive difference when churning a lot of objects/data on a lot of threads in a machine with many cores.