I'm running Matlab R2014a on a node in a Linux cluster that has 20 cores and hyperthreading enabled. I know this has been discussed before, but I'm looking for some clarification. Here's what my understanding is of the threads vs. cores issue in Matlab:
- Matlab has inherent multithreading capabilities, and will utilize extra cores on a multicore machine.
- Matlab runs its threads in such a way that putting multiple Matlab threads on the same core (i.e. hyperthreading) isn't useful. So by default, the maximum number of threads that Matlab will create is the number of cores on your system.
- When using parpool(), regardless of the number of workers you create, each worker will use only one physical core, as mentioned in this thread.
However, I've also read that using the (deprecated) function maxNumCompThreads(), you can either decrease or increase the number of threads that Matlab or one of the workers will generate. This can be useful in several scenarios:
- You want to utilize Matlab's implicit multithreading capabilities to run some code on a cluster node without allocating the entire node. It would be nice if there was some other way to do this if maxNumCompThreads ever gets removed.
- You want to do a parameter sweep but have less parameters than the number of cores on your machine. In this case you might want to increase the number of threads per worker so that all of your cores are utilized. This was suggested recently in this thread. However, in my experience, while the individual workers seem quite happy to use maxNumCompThreads() to increase their thread count, inspecting the actual CPU usage using the "top" command suggests that it doesn't have any effect, i.e. each worker still only gets to use one core. It's possible that what is happening is that the individual Matlab processes spawned by the parpool are run with the argument -singleCompThread. I've confirmed that if the parent Matlab process is run with -singleCompThread, the command maxNumCompThreads(n), where n > 1 throws an error due to the fact that Matlab is running in single threaded mode. So the result seems to be that (at least in 2014a), you can't increase the number of computational threads on the parallel pool workers. Related to this is that I can't seem to get the Parent matlab process to to start more threads than there are cores, even though the computer itself has hyperthreading enabled. Again, it will happily run maxNumCompThreads(n), where n > # physical cores, but the fact that top is showing CPU utilization to be 50% suggests otherwise. So what is happening, or what am I misunderstanding?
Edit: to lay out my questions more explicitly:
- Within a parfor loop, why doesn't setting maxNumCompThreads(n), when n > 1 seem to work? If it's because the worker process is started with -singleCompThread, why doesn't maxNumCompThreads() return an error like it does in the parent process started with -singleCompThread?
- In the parent process, why doesn't using maxNumCompThreads(n), where n > # physical cores, do anything?
Note: I posted this previously on Matlab answers, but haven't received any feedback.
Edit2: It looks like the problem in (1) was an issue with the test code I was using.
That's quite a long question, but I think the straightforward answer is that yes, as I understand it, MATLAB workers are started with -singleCompThread
.
First, a few quick tests to confirm our understanding:
> matlab.exe -singleCompThread
>> warning('off', 'MATLAB:maxNumCompThreads:Deprecated')
>> maxNumCompThreads
ans =
1
>> maxNumCompThreads(2)
Error using feature
MATLAB has computational multithreading disabled.
To enable multithreading please restart MATLAB without singleCompThread option.
Error in maxNumCompThreadsHelper (line 37)
Error in maxNumCompThreads (line 27)
lastn = maxNumCompThreadsHelper(varargin{:});
As indicated, when MATLAB is started with the -singleCompThread
option, we cannot override it using maxNumCompThreads
.
> matlab.exe
>> parpool(2); % local pool
>> spmd, n = maxNumCompThreads, end
Lab 1:
n =
1
Lab 2:
n =
1
We can see that each worker is by default limited to a single computation thread. This is a good thing because we want to avoid over-subscription and unnecessary context switches, which occurs when the number of threads trying to run exceeds the number of available physical/logical cores. So in theory, the best way to maximize CPU utilization is to start as many single-threaded workers as we have cores.
No by looking at the local worker processes running in background, we see that each is launched as:
matlab.exe -dmlworker -noFigureWindows [...]
I believe the undocumented -dmlworker
option does something similar to -singleCompThread
, but probably a bit different. For one, I was able to override it using maxNumCompThreads(2)
without it throwing an error like before..
Remember that even if a MATLAB session is running in single-threaded computation mode, it doesn't mean the computational thread is exclusively restricted to one CPU core only (the thread could jump around between cores assigned by the OS scheduler). You'll have to set the affinity of the worker processes if you want to control that..
So I did some profiling using Intel VTune Amplifier. Basically I ran some linear algebra code, and performed hotspots analysis by attaching to the MATLAB process and filtering on the mkl.dll
module (this is the Intel MKL library that MATLAB uses as an optimized BLAS/LAPACK implementation). Here are my results:
- Serial mode
I used the following code: eig(rand(500));
- Starting MATLAB normally, computation spawns 4 threads (that's the default automatic value chosen seeing that I have a quad-core i7 Intel CPU).
- starting MATLAB normally, but calling
maxNumCompThreads(1)
before the computation. As expected, only 1 thread is used by the computation.
- starting MATLAB with
-singleCompThread
option, again only 1 thread is used.
- Parallel mode (parpool
)
I used the following code: parpool(2); spmd, eig(rand(500)); end
. In both cases below, MATLAB is started normally
- when running code on the workers with the defaults settings, each worker is limited to one computation thread
- when I override the settings on the workers using
maxNumCompThreads(2)
, then each worker will use 2 threads
Here is a screenshot of what VTune reports:
Hope that answers your questions :)
I was wrong about maxNumCompThreads
not working on parpool workers. It looks like the problem was that the code I was using:
parfor j = 1:2
tic
maxNumCompThreads(2);
workersCompThreads(j) = maxNumCompThreads;
i = 1;
while toc < 200
a = randn(10^i)*randn(10^i);
i = i + 1;
end
end
used so much memory by the time I checked CPU utilization that the bottleneck was I/O and the extra threads were already shut down. When I did the following:
parfor j = 1:2
tic
maxNumCompThreads(2);
workersCompThreads(j) = maxNumCompThreads;
i = 4;
while toc < 200
a = randn(10^i)*randn(10^i);
end
end
The extra threads started and stayed running.
As for the second issue, I got a confirmation from the Mathworks that the parent Matlab process won't start more threads than the number of physical cores, even if you explicitly raise the limit beyond that. So in the documentation, the sentence:
"Currently, the maximum number of computational threads is equal to the number of computational cores on your machine."
should say:
"Currently, the maximum number of computational threads is equal to the number of physical cores on your machine."