Assignment Algorithm

2019-08-13 00:26发布

I need to assign N entities (each with possible parents and possible children) to M computation nodes while satisfying the following optimization conditions:

  1. Children of an entity want to be assigned to the same computation node (to maximize data locality among siblings)
  2. The distribution of entities should be as even as possible (i.e. no overtaxing of a single node).

I'm looking for some suggestions on heuristic methods to solve this problem.

I've read http://en.wikipedia.org/wiki/Assignment%5Fproblem.

Thanks.

2条回答
仙女界的扛把子
2楼-- · 2019-08-13 00:50

I'm not sure whether 1 is a hard requirement. If so, as a first step, you should group your entities into connected components. If not, you should specify what the tradeoff between 1 and 2 is, e.g. as a cost function.

Placing the components on computation nodes is the binpacking problem, if you limit each node to N/M entities. A good approximation is the following algorithm:

  1. sort the components by number of entities, in decreasing order
  2. place them onto nodes as long as each node has still capacity available
  3. when done with 2, you may have components that have not been placed. Place those on the nodes which have the smallest load so far.
查看更多
我命由我不由天
3楼-- · 2019-08-13 01:00

Well, obviously you have to predict how many children (or simply total load) each process will have on average. Than you can use classic assignment algorithms, they're usually quite simple.

The most important issue is of course to decide what you want to minimize. Usually we want to minimize lateness (how much "behind schedule" we get), but not always...

Edit: If you know all children/parents in advance, and children of a process must be in the same machine, you can consider a process and all its children to be the same process to begin with. Then you can use a very simple algorithm to minimize whatever you want to minimize.

Edit #2: Read about Job Scheduling to get a better idea

查看更多
登录 后发表回答