Here in the DCOS documents it is stated that
"Deciding where to run processes to best utilize cluster resources is hard, NP-hard in-fact."
I don't deny that that sounds right, but is there a proof somewhere?
Here in the DCOS documents it is stated that
"Deciding where to run processes to best utilize cluster resources is hard, NP-hard in-fact."
I don't deny that that sounds right, but is there a proof somewhere?
Best utilization of resources is variation of bin packaging problem:
In the bin packing problem, objects of different volumes must be packed into a finite number of bins or containers each of volume V in a way that minimizes the number of bins used. In computational complexity theory, it is a combinatorial NP-hard problem. The decision problem (deciding if objects will fit into a specified number of bins) is NP-complete.
We have n-dimension space where every dimension corresponds with one resource type. Each task to be scheduled has specific volume defined by required resources. Additionally task can have constraints that slightly change original task but we can treat this constraints as an additional discrete dimension. The task is to schedule tasks in a way to minimize slack resources and so prevent fragmentation.
For example Marathon uses first fit algorithm which is approximation allogrithm but is not that bad:
This is a very straightforward greedy approximation algorithm. The algorithm processes the items in arbitrary order. For each item, it attempts to place the item in the first bin that can accommodate the item. If no bin is found, it opens a new bin and puts the item within the new bin.
It is rather simple to show this algorithm achieves an approximation factor of 2, that is, the number of bins used by this algorithm is no more than twice the optimal number of bins.