Filling bins with an equal size

2019-03-01 01:39发布

问题:

I have 100 groups and each group has some elements inside. For the cross validation, I want to make five bins which their size is as equal as possible.

Is there any algorithm for this purpose.

An example for 5 groups and 2 bins:

Group_1: 5
Group_2: 6
Group_3: 2
Group_4: 7
Group_5: 1

The two bins will be:

G1 and G2 -> their sum is equal to 11.

G3, G4 and G5 -> their sum is equal to 10.

回答1:

This seems related to the set partitioning problem, which is NP-hard but fortunately admits lots of good approximation algorithms and pseudopolynomial-time dynamic programming algorithms. You may want to look into those as a starting point, since there's already quite a lot of work that's been done in this area.

Hope this helps!



回答2:

This is not a cluster analysis problem (I rewrote the question to use the more appropriate wording for you). Cluster analysis is a structure discovery task.

Instead, have a look at the following two related problems from computer science:

  • Multiprocessor scheduling seems to be what you need: given n processors, distribute the tasks such that the least time is unused
  • Bin packing problem is a classic NP-hard problem, solving the reverse problem: use as few bins of fixed size to accomodate all tasks.
  • k-Partition Problem this is probably what you want to do.

All of these appear to be NP-hard, so you will want to use an approximation only (if you have large data, with just 5 examples you can easily brute-force all combinations)



回答3:

If you're looking for a clustering algorithm (partitioning method) with equal size constraint, I would suggest the Spectral Clustering. It will satisfy your demand for clusters with almost the same sizes because it solves the normalized cut problem, which try to find a balanced cut.