I am trying to parallelize an operation using pthreads. The process looks something like:
double* doSomething( .... ) {
double* foo;
foo = new double[220];
for(i = 0; i<20; i++)
{
//do something with the elements in foo located between 10*i and 10*(i+2)
}
return foo;
}
The stuff happening inside the for-loop can be done in any order, so I want to organize this using threads.
For instance, I could use a number of threads such that each thread goes through parts of the for-loop, but works on different parts of the array. To avoid trouble when working on overlapping parts, i need to lock some memory.
How can I make a mutex (or something else) that locks only part of the array?
If you are using latest gcc you can try parallel versions of standard algorithms. See the libstdc++ parallel mode.
If you just want to make sure that a section of the array is worked once...
Make a global variable:
int _iNextSection;
Whenever a thread gets ready to operate on a section, the thread gets the next available section this way:
iMySection = __sync_fetch_and_add(&_iNextSection, 1);
__sync_fetch_and_add() returns the current value of _iNextSection and then increments _iNextSection. __sync_fetch_and_add() is atomic, which means __sync_fetch_and_add() is guaranteed to complete before another thread can do it. No locking, no blocking, simple, fast.
If the loop looks exactly like you wrote, I would use an array of 21 mutexes and block in each thread on ith an (i + 1)th mutex on the beginning of the loop.
So something like:
...
for (i = 0; i < 20; i++) {
mutex[i].lock();
mutex[i+1].lock();
...
mutex[i+1].unlock();
mutex[i].unlock();
}
The logic is that only two neighboring loop executions can access same data (if the limits are [i * 10, (i + 2) * 10)), so you only need to worry about them.