Why is my .Net app only using single NUMA node?

2019-01-24 05:42发布

问题:

I have a server with 2 NUMA node with 16 CPUs each. I can see all the 32 CPUs in task manager, first 16 (NUMA node 1) in the first 2 rows and the next 16 (NUMA node 2) in the last 2 rows.

In my app I am starting 64 threads, using Thread.Start(). When I run the app, it's CPU intensive, only the first 16 CPUs are busy, the other 16 CPUs are idle.

Why? I am using Interlocked.Increment() a lot. Could this be a reason? Is there a way I can start threads on a specific NUMA node?

回答1:

In addition to gcserver we should enable GCCpuGroup and Thread_UseAllCpuGroups so the config should be more like:

<configuration
   <runtime>
      <gcServer enabled="true"/>
      <GCCpuGroup  enabled="true"/>
      <Thread_UseAllCpuGroups  enabled="true"/>
   </runtime>
</configuration>

GcCpuGroup enables Garbage Collection for multiple CPU groups and Thread_UseAllCpuGroups enables manage thread distribution across all CPU groups for the runtime.



回答2:

First thing to check would be indeed the app.config making sure the necessary options are set:

<?xml version="1.0" encoding="utf-8"?>
<configuration>
    <runtime>
        <gcServer enabled="true" />
        <Thread_UseAllCpuGroups enabled="true" />
        <GCCpuGroup enabled="true" />
    </runtime>
    <startup> 
        <!-- 4.5 and later should work, use the one targeted -->
        <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.6.2"/>       
    </startup>
</configuration>

If app.config-Wizadry isn't helping, is likely that your machine uses multiple kernel groups (Kgroups) when it shouldn't. You can then check your BIOS for NUMA Group Size Optimization if you have Gen9 HP. If it is in Clustered mode, the current CLR (2017, .net 4.6.2) only utilizes the first one. If you have no more than 64 cores in that machine, you should be able select the Flat layout which puts all cores in the same group. If you cannot find it, you may need a BIOS Update.

For lot more details see Unable to use more than one processor group for my threads in a C# app here on StackOverflow. It even comes with its own diagnostics tool.



回答3:

Have you set the garbage collector to the server version?

In app.config, try:

<configuration
   <runtime>
      <gcServer enabled="true"/>
   </runtime>
</configuration>

Because of the way the heaps are allocated the server GC makes a massive difference when churning a lot of objects/data on a lot of threads in a machine with many cores.