Imagine a classic OMP task:
- Summing a large vector of doubles in the range [0.0, 1.0)
using namespace std;
int main() {
vector<double> v;
// generate some data
generate_n(back_inserter(v), 1ul << 18,
bind(uniform_real_distribution<double>(0,1.0), default_random_engine { random_device {}() }));
long double sum = 0;
{
#pragma omp parallel for reduction(+:sum)
for(size_t i = 0; i < v.size(); i++)
{
sum += v[i];
}
}
std::cout << "Done: sum = " << sum << "\n";
}
I have trouble coming up with an idea how to report progress. After all, OMP is handling all the coordination between team threads for me, and I don't have a piece of global state.
I could potentially use a regular std::thread
and observe some shared variable from there, but isn't there a more "omp-ish" way to achieve this?
My code below is similar to the sehe one, but there are some differences, which allowed me to cope with skipped points to report because of exact equalities, involving division by modulo. Also, the global counter collects actual loop executions for all threads, but it might be imprecise - which is acceptable for this particular problem. I use only the master thread for reporting.
On processors without native atomic support (and even with them) using
#pragma omp atomic
, as the other answers here suggest, can slow your program down.The idea of a progress indicator is to give the user an idea of when something will finish. If you're on target plus/minus a smallish fraction of the total run-time, the user isn't going to be too bothered. That is, the user would prefer that things finish sooner at the expense of knowing more exactly when things will finish.
For this reason, I usually track progress on only a single thread and use it to estimate total progress. This is just fine for situations in which each thread has a similar workload. Since you are using
#pragma omp parallel for
, you're likely working over a series of similar elements without interdependencies, so my assumption is probably valid for your use-case.I've wrapped this logic in a class
ProgressBar
, which I usually include in a header file, along with its helper classTimer
. The class uses ANSI control signals to keep things looking nice.The output looks like this:
It's also easy to have the compiler eliminate all the overhead of the progressbar by declaring the
-DNOPROGRESS
compilation flag.Code and an example usage follows:
Just let each thread in the team track local progress and update a global counter atomically. You could still make another thread observe it, or, as in my sample below, you could just do the terminal output within an OMP critical section.
The key here is to tune for a stepsize that doesn't lead to highly frequent updates, because then the locking for the critical region (and to a lesser extent the atomic load/stores) would degrade performance.
Live On Coliru
Finally, print the result. Output: