I seem to be one of few people using the Matlab coder (codegen command) to get speedup, judging by the fact that there is so little discussion or help on-line. I've gotten incredible speedups from it in some cases. I've never seen it documented, but when I make a MEX file using codegen from a Matlab script with a parfor loop, it often will thread the resulting MEX. Parfor in functions spawns multiple processes which is often less efficient than just threading (I'm inferring all this from watching top in linux and seeing multiple 100% processes in Matlab functions, but a single e.g. 1000% process when running the converted MEX). I'm working on a case now where I could really use the speedup, but I see no evidence of multiple threads being used in the MEX even though parfor is working in the base function. Anyone know what the hangup might be, or how the coder chooses when to thread?
相关问题
- Extract matrix elements using a vector of column i
- How to let a thread communicate with another activ
- Why it isn't advised to call the release() met
- multidplyr : assign functions to cluster
- ThreadPoolTaskScheduler behaviour when pool is ful
相关文章
- How to use doMC under Windows or alternative paral
- Difference between Thread#run and Thread#wakeup?
- Java/Spring MVC: provide request context to child
- Threading in C# , value types and reference types
- RMI Threads prevent JVM from exiting after main()
- How do I append metadata to an image in Matlab?
- Parallel while loop in R
- Does gfortran take advantage of DO CONCURRENT?
It will only thread the parfor loop itself, it would be dangerous for the coder to guess, and impossible to calculate where there is appropriate parallelism.
If I were you, I would try to put parfor in place of anywhere in the Matlab code that I could.
And now how to determine whether a loop is acceptable to parallelize:
Does it use IO in any form, if so, then don't, it will slow it down and remove any determinism from the code
Is there a loop for parfor to replace? If not, then you'll have to deal with the performance because there might not be anything to parallelize.