OpenMp doesn't utilize all CPUs(dual socket, w

2020-04-14 08:57发布

问题:

I have a dual socket system with 22 real cores per CPU or 44 hyperthreads per CPU. I can get openMP to completely utilize the first CPU(22 cores/44 hyper) but I cannot get it to utilize the second CPU.

I am using CPUID HWMonitor to check my core usage. The second CPU is always at or near 0 % on all cores.

Using:

int nProcessors = omp_get_max_threads();

gets me nProcessors = 44, but I think it's just using the 44 hyperthreads of 1 CPU instead of 44 real cores(should be 88 hyperthreads)

After looking around a lot, I'm not sure how to utilize the other CPU.

My CPU is running fine as I can run other parallel processing programs that utilize all of them.

I'm compiling this in 64 bit but I don't think that matters. Also, I'm using Visual studio 2017 Professional version 15.2. Open MP 2.0(only one vs supports). Running on a windows 10 Pro, 64 bit, with 2 Intel Xeon E5-2699v4 @ 2.2Ghz processors.

回答1:

So answering my own question with thanks to @AlexG for providing some insight. Please see comments section of question.

This is a Microsoft Visual Studio and Windows problem.

First read Processor Groups for Windows.

Basically, if you have under 64 logical cores, this would not be a problem. Once you get past that, however, you will now have two process groups for each socket(or other organization Windows so chooses). In my case, each process group had 44 hyperthreads and represented one physical CPU socket and I had exactly two process groups. Every process(program) by default, is only given access to one process group, hence I initially could only utilize 44 threads on one core. However, if you manually create threads and use SetThreadGroupAffinity to set the thread's processor group to one that is different from your program's initially assigned group, then your program now becomes a multi processor group. This seems like a round-about way to enable multi-processors but yes this is how to do it. A call to GetProcessGroupAffinity will show that the number of groups becomes greater than 1 once you start setting each thread's individual process group.

I was able to create an open MP block like so, and go through and assign process groups:

...

#pragma omp parallel num_threads( 88 )
{
    HANDLE thread = GetCurrentThread();

    if (omp_get_thread_num() > 32)
    {
        // Reserved has to be zero'd out after each use if reusing structure...
        GroupAffinity1.Reserved[0] = 0;
        GroupAffinity1.Reserved[1] = 0;
        GroupAffinity1.Reserved[2] = 0;
        GroupAffinity1.Group = 0;
        GroupAffinity1.Mask = 1 << (omp_get_thread_num()%32);
        if (SetThreadGroupAffinity(thread, &GroupAffinity1, &previousAffinity))
        {
            sprintf(buf, "Thread set to group 0: %d\n", omp_get_thread_num());
            OutputDebugString(buf);
        }
    }
    else
    {
        // Reserved has to be zero'd out after each use if reusing structure...
        GroupAffinity2.Reserved[0] = 0;
        GroupAffinity2.Reserved[1] = 0;
        GroupAffinity2.Reserved[2] = 0;
        GroupAffinity2.Group = 1;
        GroupAffinity2.Mask = 1 << (omp_get_thread_num() % 32);
        if (SetThreadGroupAffinity(thread, &GroupAffinity2, &previousAffinity))
        {
            sprintf(buf, "Thread set to group 1: %d\n", omp_get_thread_num());
            OutputDebugString(buf);
        }
    }
}

So with the above code, I was able to force 64 threads to run, 32 threads each per socket. Now I couldn't get over 64 threads even though I tried forcing omp_set_num_threads to 88. The reason seems to be linked to Visual Studio's implementation of OpenMP not allowing more than 64 OpenMP threads. Here's a link on that for more information

Thanks all for helping glean some more tidbits that helped in the eventual answer!