I have been trying to understand the Segemented Ring Allreduce in OpenMPI (V2.0.2). But I failed to figure out this pipelined ring allreduce, especially how the phases are pipelined. (i.e. COMPUTATION PHASE 1 (b) seems to perform the two phases concurrently instead of "pipelinely".) Could MPI experts provide the motivation behind this Segmented Ring Allreduce and details about the pipeline?
Really appreciated, Leo
i think this has been asked and answered at https://github.com/open-mpi/ompi/issues/4067
very specific questions regarding the internals of Open MPI, are better asked directly to the Open MPI mailing list(s) or github repository