I have a code to compute a Gaussian Mixture Model with Expectation Maximization in order to identify the clusters from a given input data sample.
A piece of the code is repeating the computation of such model for a number of trials Ntrials (one indepenendet of the other but using the same input data) in order to finally pick up the best solution (the one maximizing the total likelihood from the model). This concept can be generalized to many other clustering algorithms (e.g. k-means).
I want to parallelize the part of the code that has to be repeated Ntrials times through multi-threading with C++11 such that each thread will execute one trial.
A code example, assuming an input Eigen::ArrayXXd sample
of (Ndimensions x Npoints) can be of the type:
double bestTotalModelProbability = 0;
Eigen::ArrayXd clusterIndicesFromSample(Npoints);
clusterIndicesFromSample.setZero();
for (int i=0; i < Ntrials; i++)
{
totalModelProbability = computeGaussianMixtureModel(sample);
// Check if this trial is better than the previous one.
// If so, update the results (cluster index for each point
// in the sample) and keep them.
if totalModelProbability > bestTotalModelProbability
{
bestTotalModelProbability = totalModelProbability;
...
clusterIndicesFromSample = obtainClusterMembership(sample);
}
}
where I pass the reference value of sample (Eigen::Ref), and not sample itself to both the functions computeGaussianMixtureModel() and obtainClusterMembership().
My code is heavily based on Eigen array, and the N-dimensional problems that I take can account for order 10-100 dimensions and 500-1000 different sample points. I am looking for some examples to create a multi-threaded version of this code using Eigen arrays and std:thread of C++11, but could not find anything around and I am struggling with making even some simple examples for manipulation of Eigen arrays.
I am not even sure Eigen can be used within std::thread in C++11. Can someone help me even with some simple example to understand the synthax? I am using clang++ as compiler in Mac OSX on a CPU with 6 cores (12 threads).
OP's question attracted my attention because number-crunching with speed-up earned by multi-threading is one of the top todo's on my personal list.
I must admit that my experience with the Eigen library is very limited. (I once used the decompose of 3×3 rotation matrices to Euler angles which is very clever solved in a general way in the Eigen library.)
Hence, I defined another sample task consisting of a stupid counting of values in a sample data set.
This is done multiple times (using the same evaluation function):
test-multi-threading.cc
:Compiled and tested on cygwin64 on Windows 10:
I did the same on coliru.com. (I had to reduce the heat up cycles and the sample size as I exceeded the time limit with the original values.):
Live Demo on coliru
I wonder a little bit that the ratios on coliru (with only 4 threads) are even better than on my PC with (with 8 threads). Actually, I don't know how to explain this. However, there are a lot of other differences in the two setups which may or may not be responsible. At least, both measurements show a rough speed-up of 3 for 3rd and 4th approach where the 2nd consumes uniquely every potential speed-up (probably by starting and joining all these threads).
Looking at the sample code, you will recognize that there is no mutex or any other explicit locking. This is intentionally. As I've learned (many, many years ago), every attempt for parallelization may cause a certain extra amount of communication overhead (for concurrent tasks which have to exchange data). If communication overhead becomes to big, it simply consumes the speed advantage of concurrency. So, best speed-up can be achieved by:
In my sample code, I
Concerning 3. I struggled a bit whether this is legal or not i.e. is it granted for data which is written in threads to appear correctly in main thread after joining. (The fact that something seems to work fine is illusive in general but especially illusive concerning multi-threading.)
cppreference.com provides the following explanations
for
std::thread::thread()
for
std::thread::join()
In Stack Overflow, I found the following related Q/A's:
which convinced me, it is OK.
However, the drawback is that
An alternative approach could be the usage of a thread pool to overcome this. I googled a bit and found e.g. Jakob Progsch's ThreadPool on github. However, I guess, with a thread pool the locking issue is back “in the game”.
Whether this will work for Eigen functions as well, depends on how the resp. Eigen functions are implemented. If there are accesses to global variables in them (which become shared when the same function is called concurrently), this will cause a data race.
Googling a bit, I found the following doc.
Eigen and multi-threading – Using Eigen in a multi-threaded application: