可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a very large diagonal matrix that I need to split for parallel computation. Due to data locality issues it makes no sense to iterate through the matrix and split every n-th calculation between n threads. Currently, I am dividing k x k diagonal matrix in the following way but it yields unequal partitions in terms of the number of the calculations (smallest piece calculates a few times longer than the largest).

def split_matrix(k, n):
    split_points = [round(i * k / n) for i in range(n + 1)] 
    split_ranges = [(split_points[i], split_points[i + 1],) for i in range(len(split_points) - 1)]
    return split_ranges

import numpy as np
k = 100
arr = np.zeros((k,k,))
idx = 0
for i in range(k):
    for j in range(i + 1, k):
        arr[i, j] = idx
        idx += 1


def parallel_calc(array, k, si, endi):
     for i in range(si, endi):
         for j in range(k):
             # do some expensive calculations

for start_i, stop_i in split_matrix(k, cpu_cnt):
     parallel_calc(arr, k, start_i, stop_i)

Do you have any suggestions as to the implementation or library function?

回答1:

After a number of geometrical calculations on a side I arrived at the following partitioning that gives roughly the same number of points of the matrix in each of the vertical (or horizontal, if one wants) partitions.

def offsets_for_equal_no_elems_diag_matrix(matrix_dims, num_of_partitions):
    if 2 == len(matrix_dims) and matrix_dims[0] == matrix_dims[1]:  # square
        k = matrix_dims[0]
        # equilateral right angle triangles have area of side**2/2 and from this area == 1/num_of_partitions * 1/2 * matrix_dim[0]**2 comes the below
        # the k - ... comes from the change in the axis (for the calc it is easier to start from the smallest triangle piece)
        div_points = [0, ] + [round(k * math.sqrt((i + 1)/num_of_partitions)) for i in range(num_of_partitions)]
        pairs = [(k - div_points[i + 1], k - div_points[i], ) for i in range(num_of_partitions - 1, -1, -1)]
        return pairs

回答2:

I thin you should update your split_matrix method, as it returns one split range less, than you want (setting cpu_cnt=4 will return only 3 tuples, and not 4):

def split_matrix(k, n):
    split_points = [round(i * k / n) for i in range(n+1)] 
    return [(split_points[i], split_points[i + 1],) for i in range(len(split_points) - 1)]

Edit: If your data locality is not so string you could try this: create a queue of tasks, in which you add all indices/entries for which this calculation shall be performed. Then you initialize your parallel workers (e.g. using multiprocessing) and let them start. This worker now pick a element out of the queue, calculate the result, store it (e.g. in another queue) and continue with the next item, and so on.

If this is not working for your data, I don't think, that you can improve anymore.