Algorithm to transform a workflow DAG into paralle

2019-04-09 09:50发布

Say I have a graph where nodes are workloads of various kinds and edges are dependencies between the workloads. (This is a DAG since cyclical dependencies must not exist.)

I also have a set of multiple agents who can perform the work.

Some workload varieties may be given to any agent, others must be given to a specific agent, and others must be given to one agent among a particular group of agents.

How do I assign workloads such that:

  • No workload is given to an agent until all its blocking workloads are completed

  • The shortest possible time is required to complete the total workload graph. (Note that minimizing agent idle time is generally good, but not a fundamental requirement - there may be scenarios under which one particular agent idles for longer but the total time to complete all jobs across all agents is at a minimum.)

Workloads have duration estimates, but assume for simplicity's sake that every workload takes equal time to compute. (Just break each workload down into multiple, serially-dependent workloads until every workload is effectively a constant-time operation.)

I'm aware of topological DAG sorting, but that produces a single, serial ordering of nodes. I have multiple agents operating in parallel, and the relationships are such that potentially large timing optimizations can be made by non-obvious reordering of the tasks.

The result of this would be rendered best as a Gantt chart of minimum overall duration. In fact, if you think of the problem as the allocation of bug tickets in a milestone to engineers in a team, with the goal of getting the milestone done ASAP, then you get the idea. (No... please don't tell me to import my graph into MS Project and then export it :) - I'm interested in the algorithm behind it!)

Pointers to well known algorithms, software libraries, or general issues and principles are much appreciated!

2条回答
太酷不给撩
2楼-- · 2019-04-09 10:18

Unless you have infinite number of agents so that a compatible agent is available as soon as all the predecessors of a task is done, this is an NP-hard problem.

< shameless plug >

A very similar problem is there in my book "Algorithms For Interviews"

< /shameless plug >

Here is the problem and the solution from the book:

We need to schedule N lectures in M classrooms. Some of those lectures are prerequisites for others. How would you choose when and where to hold the lectures in order to finish all the lectures as soon as possible?

Solution: We are given a set of N unit duration lectures and M classrooms. The lectures can be held simultaneously as long as no two lectures need to happen in the same classroom at the same time and all the precedence constraints are met. The problem of scheduling these lectures so as to minimize the time taken to completion is known to be NP-complete. This problem is naturally modeled using graphs. We model lectures as vertices, with an edge from vertex u to vertex v if u is a prerequisite for v. Clearly, the graph must be acyclic for the precedence constraints to be satisfied. If there is just one lecture room, we can simply hold the lectures in topological order and complete the N lectures in N time (assuming each lecture is of unit duration). We can develop heuristics by observing the following: at any time, there is a set of lectures whose precedence constraints have been satisfied. If this set is smaller than M, we can schedule all of them; otherwise, we need to select a subset to schedule. The subset selection can be based on several metrics:

  • Rank order lectures based on the length of the longest dependency chain that they are at the start of.
  • Rank order lectures based on the number of lectures that they are immediate prerequisites for.
  • Rank order lectures based on the total number of lectures that they are direct or indirect prerequisites for.

We can also use combinations of these criteria to order the lectures that are currently schedulable. For example, for each vertex, we define its criticality to be the length of a longest path from it to a sink. We schedule lectures by processing vertices in topological order. At any point in our algorithm, we have a set of candidate lectures-these are the lectures whose prerequisites have already been scheduled. If the candidate set is less than size M, we schedule all the lectures; otherwise, we choose the M most critical lectures and schedule those-the idea is that they should be scheduled sooner since they are at the start of longer dependency chains. The criterion is heuristic and may not lead to optimum schedules-this is to be expected since the problem is NP-complete. Other heuristics may be employed, e.g., we may use the number of lectures that depend on lecture L as the criticality of lecture L or some combination of the criterion.

查看更多
聊天终结者
3楼-- · 2019-04-09 10:26

The Wikipedia article on PERT might be a useful place to start.

查看更多
登录 后发表回答