I have a very large diagonal matrix that I need to split for parallel computation. Due to data locality issues it makes no sense to iterate through the matrix and split every n-th calculation between n threads. Currently, I am dividing k x k diagonal matrix in the following way but it yields unequal partitions in terms of the number of the calculations (smallest piece calculates a few times longer than the largest).
def split_matrix(k, n):
split_points = [round(i * k / n) for i in range(n + 1)]
split_ranges = [(split_points[i], split_points[i + 1],) for i in range(len(split_points) - 1)]
return split_ranges
import numpy as np
k = 100
arr = np.zeros((k,k,))
idx = 0
for i in range(k):
for j in range(i + 1, k):
arr[i, j] = idx
idx += 1
def parallel_calc(array, k, si, endi):
for i in range(si, endi):
for j in range(k):
# do some expensive calculations
for start_i, stop_i in split_matrix(k, cpu_cnt):
parallel_calc(arr, k, start_i, stop_i)
Do you have any suggestions as to the implementation or library function?
I thin you should update your
split_matrix
method, as it returns one split range less, than you want (settingcpu_cnt=4
will return only3
tuples, and not4
):Edit: If your data locality is not so string you could try this: create a
queue
of tasks, in which you add all indices/entries for which this calculation shall be performed. Then you initialize your parallel workers (e.g. usingmultiprocessing
) and let them start. This worker now pick a element out of thequeue
, calculate the result, store it (e.g. in anotherqueue
) and continue with the next item, and so on.If this is not working for your data, I don't think, that you can improve anymore.
After a number of geometrical calculations on a side I arrived at the following partitioning that gives roughly the same number of points of the matrix in each of the vertical (or horizontal, if one wants) partitions.