I'm fairly new to OpenMP and I'm trying to start an individual thread to process each item in a 2D array.
So essentially, this:
for (i = 0; i < dimension; i++) {
for (int j = 0; j < dimension; j++) {
a[i][j] = b[i][j] + c[i][j];
What I'm doing is this:
#pragma omp parallel for shared(a,b,c) private(i,j) reduction(+:diff) schedule(dynamic)
for (i = 0; i < dimension; i++) {
for (int j = 0; j < dimension; j++) {
a[i][j] = b[i][j] + c[i][j];
Does this in fact start a thread for each 2D item or no? How would I test that? If it is wrong, what is the correct way to do it? Thanks!
Note: The code has been greatly simplified
Only the outer loop is parallel in your code sample. You can test by printing
in the inner loop and you will see that, for a giveni
, the thread num is the same (of course, this test is demonstrative rather than definitive since different runs will give different results). For example, with:I get:
As for the rest of your code, you might want to put more details in a new question (it's difficult to tell from the small sample), but for example, you can't put
is only declared later. It is automatically private in my example above. I guessdiff
is a variable that we can't see in the sample. Also, the loop variablei
is automatically private (from the version 2.5 spec - same in the 3.0 spec)Edit: All of the above is correct for the code that you and I have shown, but you may be interested in the following. For OpenMP Version 3.0 (available in e.g. gcc version 4.4, but not version 4.3) there is a
clause where you could write the code as you have, but with#pragma omp parallel for collapse (2)
to parallelize both for loops (see the spec).Edit: OK, I downloaded gcc 4.5.0 and ran the above code, but using
collapse (2)
to get the following output, showing the inner loop now parallelized:Comments here (search for "Workarounds") are also relevant for work-arounds in version 2.5 if you want to parallelize both loops, but the version 2.5 spec cited above is quite explicit (see the non-conforming examples in section A.35).
You can try of using nested omp parallel fors (after
call), but they a not supported on all openmp implementations.So I guess to make some 2D grid and start all thread on grid from single for (example for fixed 4x4 thread grid):