Here's sequential code:
do i = 1, n
do j = i+1, n
if ("some_condition(i,j)") then
result = "here's result"
return
end if
end do
end do
Is there a cleaner way to execute iterations of the outer loop concurrently other than:
!$OMP PARALLEL private(i,j)
!$OMP DO
do i = 1, n
!$OMP FLUSH(found)
if (found) goto 10
do j = i+1, n
if ("some_condition(i,j)") then
!$OMP CRITICAL
!$OMP FLUSH(found)
if (.not.found) then
found = .true.
result = "here's result"
end if
!$OMP FLUSH(found)
!$OMP END CRITICAL
goto 10
end if
end do
10 continue
end do
!$OMP END DO NOWAIT
!$OMP END PARALLEL
The order of iterations over i
-loop may be arbitrary as long as some result
is found (it doesn't matter if it changes from run to run as long as it satisfies "some_condition"
).
It seems that your sequential code has a dependency that makes it unsuitable to being made parallel. Suppose that there are multiple values of i & j that make "some condition" true -- then the order of execution of the i & j do loops determines which of these conditions is found first and sets the value of result, after which the return statement ends the search for additional cases i,j that "some condition" is true. In the sequential code, the do loops always execute in the same order, so the operation of the program is deterministic and identical values of i & j that make "some condition" true will always be found. In a concurrent version, various loops i execute in non-deterministic order, so that from run to run different values of i might be the first i-value that finds a true "some condition".
Perhaps you as a programmer know that there is only one value of i & j that results in a true "some condition"? In that case short-circuiting the execution would seem OK. But the OpenMP spec says that "No statement in the associated loops other than the DO statements can cause a branch
out of the loops" so having the something in the inner loop abort the output loop isn't allowed. If it is the case that there is always only one true "some condition", you could just remove the "return" and waste CPU time by having threads look for "some condition" is true after the one case has been found. That might still be faster than a sequential program. With a scaler "result" variable, it still probably non-compliant, having an dependency on the order of execution. You could change it in to a "reduction", summing the result, or return result as 1-D array of dimension (n). If you need to find the smallest value of i that has "some condition" true, you could obtain that from an array result using the Fortran instrinsic function minloc.
A solution with many "flush" and "critical" directives may not be faster than the sequential version.
UPDATE: Based on the clarification that multiple results are possible and that any will do, one parallel method would be to return mutiple results and let sequential code pick one out -- make "result" into a 1D array rather than a scaler. You are allowed to short-circuit the inner j-loop because it is not "associated" with the "omp do" directive, so "result" need only be 1D, dimensioned according to the range of i. So something like this:
program test1
integer :: i, j
integer, parameter :: n = 10
integer, dimension (n) :: result
result = -999
!omp parallel default (shared) private (i, j)
!omp do
do i = 1, n
inner: do j = i+1, n
if ( mod (i+j,14) == 0 ) then
result (i) = i
exit inner
end if
end do inner
end do
!omp end do
!omp end parallel
write (*, *) 'All results'
write (*, *) result
write (*, *)
write (*, *) 'One result'
write (*, *) result ( maxloc (result, 1) )
end program test1
Another approach entirely would be to use the TASK construct which is part of OpenMP 3.0. What you seem to be trying to do is to divide your loops across threads, compute until any thread finds an answer, then have all threads stop. Trouble is, the necessity to have all threads check a shared flag is (a) killing your performance and (b) leading you into ugly loops with BREAKS and CYCLES.
I think @M.S.B.'s answer gives very good advice on how to adapt your existing approach. But, perhaps a more natural way of tackling the problem would be for the program to create a number of tasks (perhaps one for each iteration of your innermost loop) and to dispatch those to worker threads. Once any thread reports success all threads can be sent a finalisation task and your program can continue.
This would, of course, require more re-writing of your program and probably make sequential execution worse. It will definitely require that your implementation of OpenMP supports v3.0 of the standard.
And you may need more help in this area than I can manage, I've only just started playing with OpenMP TASKS myself.
It seems $OMP DO
doesn't allow break out of the loop earlier. An alternative might be to implement it by hand.
Give each thread fixed continuous range of indices to process
Following Guide into OpenMP: Easy multithreading programming for C++:
results = "invalid_value"
!$OMP PARALLEL private(i,j,thread_num,num_threads,start,end)
thread_num = OMP_GET_THREAD_NUM()
num_threads = OMP_GET_NUM_THREADS()
start = thread_num * n / num_threads + 1
end = (thread_num + 1) * n / num_threads
outer: do i = start, end
!$OMP FLUSH(found)
if (found) exit outer
do j = i+1, n
if ("some_condition") then
found = .true.
!$OMP FLUSH(found)
results(thread_num+1) = "here's result"
exit outer
end if
end do
end do outer
!$OMP END PARALLEL
! extract `result` from `results` if any
do i = 1, size(results)
if (results(i).ne."invalid_value") result = results(i)
end do
UPDATE: replaced goto
by exit
, introduced results
array based on @M. S. B.'s answer.
If solution exists this approach is faster then $OMP DO
due to earlier exit.
Give each thread one iteration at a time to process
Using task directive (suggested by @High Performance Mark):
!$OMP PARALLEL
!$OMP SINGLE
!$OMP TASK UNTIED
! "untied" allows other threads to generate tasks
do i = 1, n ! i is private
!$OMP TASK ! implied "flush"
task: do j = i+1, n ! i is firstprivate, j is private
if (found) exit task
if ("some_condition(i,j)") then
!$OMP CRITICAL
result = "here's result" ! result is shared
found = .true. ! found is shared
!$OMP END CRITICAL ! implied "flush"
exit task
end if
end do task
!$OMP END TASK
end do
!$OMP END TASK
!$OMP END SINGLE
!$OMP END PARALLEL
This variant is 2 times faster on my tests than the version with the outer
-loop.