如何，尽管没有被处理的数量是干净整除ARRAY_SIZE在MPI进程之间大致均匀地分担工作？(How

2019-08-18 01:23发布

站内文章 / 前沿技术

72 0

放荡不羁爱自由

女 | 书童

私信

大家好，我有长度为N的阵列，我想它尽可能地“尺寸”处理器之间划分。 N /大小具有由3个处理由7个处理，或14个处理划分一个余数，例如1000的数组元素。

我知道至少一对夫妇在MPI的工作分担方式，如：

for (i=rank; i<N;i+=size){ a[i] = DO_SOME_WORK }

然而，这并不阵列划分为相邻块，我想要做这，因为我相信是IO原因更快。

另外一个我所知道的是：

int count = N / size;
int start = rank * count;
int stop = start + count;

// now perform the loop
int nloops = 0;

for (int i=start; i<stop; ++i)
{
    a[i] = DO_SOME_WORK;
}

然而，用这种方法，我的第一个例子中，我们得到7分之1000= 142 =计数。所以最后的排名开始于852，并结束于994最后6行被忽略。

将追加这样的事情以前的代码最好的解决办法？

int remainder = N%size;
int start = N-remainder; 
if (rank == 0){
     for (i=start;i<N;i++){
         a[i] = DO_SOME_WORK;
     }

这似乎乱了，如果它的最好的解决方案，我很惊讶，我还没有看到它在其他地方。

谢谢你的帮助！

Answer 1:

如果我有N任务（例如，数组元素）和size的工人（如MPI居），我会去如下：

int count = N / size;
int remainder = N % size;
int start, stop;

if (rank < remainder) {
    // The first 'remainder' ranks get 'count + 1' tasks each
    start = rank * (count + 1);
    stop = start + count;
} else {
    // The remaining 'size - remainder' ranks get 'count' task each
    start = rank * count + remainder;
    stop = start + (count - 1);
}

for (int i = start; i <= stop; ++i) { a[i] = DO_SOME_WORK(); }

这是如何工作的：

/*
  # ranks:                    remainder                     size - remainder
            /------------------------------------\ /-----------------------------\
     rank:      0         1             remainder-1                         size-1
           +---------+---------+-......-+---------+-------+-------+-.....-+-------+
    tasks: | count+1 | count+1 | ...... | count+1 | count | count | ..... | count |
           +---------+---------+-......-+---------+-------+-------+-.....-+-------+
                      ^       ^                            ^     ^
                      |       |                            |     |
   task #:  rank * (count+1)  |        rank * count + remainder  |
                              |                                  |
   task #:  rank * (count+1) + count   rank * count + remainder + count - 1

            \------------------------------------/ 
  # tasks:       remainder * count + remainder
*/

Answer 2:

考虑你的“1000步和第7个流程”的例子。

简单的划分不会起作用，因为整数除法（C语言）给你的地板上，你会留下一些余：即七分之千是142，并且将有6个装饰物挂出
天花板分具有相反的问题为：ceil（7分之1000）是143，但随后的最后一个处理器移过阵列，或以较少的最终比其他的事情。

您所要求的一项计划，在处理器的平均分配剩余部分。一些过程应该有142人143必须有一个更正式的方式，但考虑到这个问题，在过去的六个月里得到了也许不会注意。

这里是我的方法。每个进程需要做的这个算法，只是挑出它需要为自己的答案。

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>

int main (int argc, char ** argv)
{
#define NR_ITEMS 1000
    int i, rank, nprocs;;
    int *bins;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
    bins = calloc(nprocs, sizeof(int));

    int nr_alloced = 0;
    for (i=0; i<nprocs; i++) {
        remainder = NR_ITEMS - nr_alloced;
        buckets = (nprocs - i);
        /* if you want the "big" buckets up front, do ceiling division */
        bins[i] = remainder / buckets;
        nr_alloced += bins[i];
    }

    if (rank == 0)
        for (i=0; i<nprocs; i++) printf("%d ", bins[i]);

    MPI_Finalize();
    return 0;
}

Answer 3:

这里有一个封闭形式的解决方案。

让N =阵列长度和P =处理器的数量。

从j = 0到P -1，

开始于处理器J个数组的点=地板（N * 焦耳 / P）

上处理器J =地板阵列的长度（N *（J + 1）/ P） -地板（N * 焦耳 / P）

Answer 4:

我认为最好的办法是自己写的拆分工作的小功能跨进程均匀地就够了 。下面是一些伪代码，我敢肯定，你可以写C（是，C在你的问题？）比我好的人。

function split_evenly_enough(num_steps, num_processes)
    return = repmat(0, num_processes)  ! pseudo-Matlab for an array of num_processes 0s
    steps_per_process = ceiling(num_steps/num_processes)
    return = steps_per_process - 1 ! set all elements of the return vector to this number
    return(1:mod(num_steps, num_processes)) = steps_per_process  ! some processes have 1 more step
end

Answer 5:

我知道这是很长的意义消失，但一个简单的方法来做到这一点是给每个进程（项目数）的地板/（进程数）+（1如果process_num <项数模num_procs）。在Python中，与工作计数的数组：

# Number of items
NI=128
# Number of processes
NP=20

# Items per process
[NI/NP + (1 if P < NI%NP else 0)for P in range(0,NP)]

Answer 6:

这个怎么样？

int* distribute(int total, int processes) {
    int* distribution = new int[processes];
    int last = processes - 1;        

    int remaining = total;
    int process = 0;

    while (remaining != 0) {
        ++distribution[process];
        --remaining;

        if (process != last) {
            ++process;
        }
        else {
            process = 0;
        }
    }

    return distribution;
}

这个想法是，只要你到达最后一个元素分配给第一进程，那么元素到第二进程，那么元素的第三种方法，等等，跳回到第一个进程。

此方法有效，即使工序数比元件的数量更大。它仅使用非常简单的操作，因此应该是非常快的。

Answer 7:

我有一个类似的问题，这里是Python和mpi4py API我的非最佳解决方案。最佳的解决方案将考虑到处理器的布局方式，在这里额外的工作ditributed较低的行列。不均匀的工作量只有一个任务不同，所以它不应该是一般的一个大问题。

from mpi4py import MPI
import sys
def get_start_end(comm,N):
    """
    Distribute N consecutive things (rows of a matrix , blocks of a 1D array)
    as evenly as possible over a given communicator.
    Uneven workload (differs by 1 at most) is on the initial ranks.

    Parameters
    ----------
    comm: MPI communicator
    N:  int
    Total number of things to be distributed.

    Returns
    ----------
    rstart: index of first local row
    rend: 1 + index of last row

    Notes
    ----------
    Index is zero based.
    """

    P      = comm.size
    rank   = comm.rank
    rstart = 0
    rend   = N
    if P >= N:
        if rank < N:
            rstart = rank
            rend   = rank + 1
        else:
            rstart = 0
            rend   = 0
    else:
        n = N//P # Integer division PEP-238
        remainder = N%P
        rstart    = n * rank
        rend      = n * (rank+1)
        if remainder:
            if rank >= remainder:
                rstart += remainder
                rend   += remainder
            else:
                rstart += rank
                rend   += rank + 1
    return rstart, rend

if __name__ == '__main__':
    comm = MPI.COMM_WORLD
    n = int(sys.argv[1])
    print(comm.rank,get_start_end(comm,n))

文章来源: How to share work roughly evenly between processes in MPI despite the array_size not being cleanly divisible by the number of processes?

标签： mpi