The Armadillo C++ linear algebra library documentation states one of the reasons for developing the library in C++ to be "ease of parallelisation via OpenMP present in modern C++ compilers", but the Armadillo code does not use OpenMP. How can I gain the benefits of parallelisation with Armadillo? Is this achieved by using one of the high-speed LAPACK and BLAS replacements? My platform is Linux, Intel processor but I suspect there is a generic answer to this question.
相关问题
- Sorting 3 numbers without branching [closed]
- How to compile C++ code in GDB?
- Why does const allow implicit conversion of refere
- thread_local variables initialization
- What uses more memory in c++? An 2 ints or 2 funct
相关文章
- Class layout in C++: Why are members sometimes ord
- How to mock methods return object with deleted cop
- Which is the best way to multiply a large and spar
- C++ default constructor does not initialize pointe
- How to use doMC under Windows or alternative paral
- Selecting only the first few characters in a strin
- What exactly do pointers store? (C++)
- Converting glm::lookat matrix to quaternion and ba
Okay so it appears that parallelisation is indeed achieved by using the high-speed LAPACK and BLAS replacements. On Ubuntu 12.04 I installed OpenBLAS using the package manager and built the Armadillo library from the source. The examples in the
examples
folder built and run and I can control the number of cores using theOPENBLAS_NUM_THREADS
environment variable.I created a small project openblas-benchmark which measures the performance increase of Armadillo when computing a matrix product C=AxB for various size matrices but I could only test it on a 2-core machine so far.
The performance plot shows nearly 50% reduction in execution time for matrices larger than 512x512. Note that both axes are logarithmic; each grid line on the y axis represents a doubling in execution time.![Performance plot](https://i.stack.imgur.com/v49Cs.png)