I need an algorithm of dividing different manufacturing parts in to uneven groups. The main condition is that difference between maximum number in the group and all others should be as low as possible. For
example:
if we have list [1,3,4,11,12,19,20,21]
and we decide that it should be divided in 3 parts it should be divided into [1,3,4],[11,12],[19,20,21]
. In the same case if we decide to divide it in to 4 we would get :
[1,3,4],[11],[12],[19,20,21].
In order to clarify "difference between maximum number in the group and all others" - [1,3,4] = 4 - 1 + 4 - 3 + 4 - 4 = 4,[11] = 11 - 11 = 0 ,[12,19] = 19 - 12 + 19 - 19 = 7 ,[20,21] = 21 -20 + 21 - 21 = 1. Total difference = 12. In the other possible case [1,3,4] = 4 - 1 + 4 - 3 + 4 - 4 = 4,[11,12,19] = 19 - 11 + 19 - 12 + 19 - 19 = 12,[20,21] = 21 - 20 + 21 - 21 = 0. Total difference = 16. This is calculation of over performance. This is due to the fact that larges number (representing for example strength) need to replace smallest number in the group (weakest). Using super strong part would be too expensive or heavy so optimization is needed.
So first I was thinking to slice the list in all possible combinations and then calculate the "difference between maximum number in the group and all others in the group". Then select as a final result the one with smallest minimum difference.
I was wondering if there is some build in function in python or Spyder
or similar. If I need to write a code could you help me?
I'm trying to work on random list divided in to 10 in order to reapply it in different situations. l = sorted(random.sample(range(100), 10)).
As you don't mention whats the logic behind your slicing for beginning i suggest this function :
Here i use
numpy.around
to round thefloat(le)/n
for got a true slicing !Edit: based on clarified question, here is another algorithm. I still kept the original response below in case it's relevant.
You can solve the problem using dynamic programming. Note that the code below is not optimized for speed, because I thought that would make it too hard to understand. If you implement it carefully, you can do it in
O(N * K)
, whereN
is the length ofa
andK
is the number of sets to partition into.Original response below.
Here are two approaches that might do what you want. Suppose your numbers are, in ascending order,
Let
max_diff(S)
denote the maximum difference between two elements of a setS
. We want to split up the numbers into setsS[0], ... , S[k - 1]
such that themax_diff(S[i])
are small.First, suppose we are trying to minimize the sum of the
max_diff(S[i])
. Notice that the sum of themax_diff(S[i])
is justa[n - 1] - a[0]
minus the "gaps" between theS[i]
. Thus, you can just find thek - 1
largest of thea[i + 1] - a[i]
and exclude those. In python code,Alternatively, suppose we are trying to minimize the maximum of the
max_diff(S[i])
. Then, we can do a binary search on the achievable value. In code,Based on your updated comments, it sounds like you are looking for the K-Means algorithm, or similar things, that will cluster your list elements into distinct groups based on their distance from proposed centers (this is what your difference calculation is really measuring).
In your criterion, note that it never makes sense to subtract the max of each subgroup from itself, since this is always zero by definition. So really you're looking at the sum of the max minus each element, over all non-max elements (what to do with duplicates is also a question you need to answer). K-Means will do something different (it will look at every point's distance from the average of the points), but in spirit it's the same. You can modify k-means to use your notion of a group score, although I don't really see any benefit to that in terms of the clustering output -- I'd need to see some kind of math proofs about the limiting behavior of the different criteria to be convinced that it matters.
You can achieve this reasonably easily with the
sklearn
andnumpy
modules:Then look at
km.labels_
:You can see that this would put together
[1,2,3]
,[11, 12]
,[20, 21 , 22]
,[30, 35]
. Below is some code that actually gets this for you:But note that this is not perfect: it is an iterative method not guaranteed to converge to any "true" solution, and for bizarre enough input data, you can get bizarre output.
Alternatively, a more basic understanding of what you want is to choose index integers
i[0]
throughi[k]
, such thatwith
i[0]=0
andi[k+1]
understood to mean "everything else in the list." Then define:So a solution for you is a tuple of parameters,
(k, i[0], ..., i[k])
and you want the choice that minimized the above expressioncriterion
.A generic solution for this problem is quite complicated. But if you're willing to accept a greedy solution that will be very balanced except for the final sublist, many of these solutions will do.