I have a large number of independent tasks I would like to run, and I would like to distribute them on a parallel system such that each processor does the same amount of work, and maximizes my efficiency.
I would like to know if there is a general approach to finding a solution to this problem, or possibly just a good solution to my exact problem.
I have T=150 tasks I would like to run, and the time each task will take is t=T. That is, task1 takes 1 one unit of time, task2 takes 2 units of time... task150 takes 150 units of time. Assuming I have n=12 processors, what is the best way to divide the work load between workers, assuming the time it takes to begin and clean up tasks is negligible?
Despite my initial enthusiasm for @HighPerformanceMark's ingenious approach, I decided to actually benchmark this using GNU Parallel with
-j 12
to use 12 cores and simulated 1 unit of work with 1 second of sleep.First I generated a list of the jobs as suggested with:
That looks like this:
Then I pass the list into GNU Parallel and pick up the remaining 6 jobs at the end in parallel:
That runs in 16 mins 24 seconds.
Then I used my somewhat simpler approach, which is just to run big jobs first so you are unlikely to be left with any big ones at the end and thereby get imbalance in CPU load because just one big job needs to run and the rest of your CPUs have nothing to do:
And that runs in 15 minutes 48 seconds, so it is actually faster.
I think the problem with the other approach is that after the first 6 rounds of 12 pairs of jobs, there are 6 jobs left the longest of which takes 78 seconds, so effectively 6 CPUs sit there doing nothing for 78 seconds. If the number of tasks was divisible by the number of CPUs, that would not occur but 150 doesn't divide by 12.
The solution I came to was similar to those mentioned above. Here is the pseudo-code if anyone is interested:
This seemed to be the optimal method for me. I tried it on many different distributions of job run-times, and it seems to do a decent job of evenly distributing the work, so long as N_proc << N_jobs.
This is a slight modification from largest first, in that each processor first tries to avoid doing more than it's "fair share". If it must go over it's fair share, then it will attempt to stay near the fair answer by grabbing the smallest remaining task from the queue.