Splitting up a list into different length parts un

2019-09-06 20:35发布

I need an algorithm of dividing different manufacturing parts in to uneven groups. The main condition is that difference between maximum number in the group and all others should be as low as possible. For

example:

if we have list [1,3,4,11,12,19,20,21] and we decide that it should be divided in 3 parts it should be divided into [1,3,4],[11,12],[19,20,21]. In the same case if we decide to divide it in to 4 we would get :

 [1,3,4],[11],[12],[19,20,21].

In order to clarify "difference between maximum number in the group and all others" - [1,3,4] = 4 - 1 + 4 - 3 + 4 - 4 = 4,[11] = 11 - 11 = 0 ,[12,19] = 19 - 12 + 19 - 19 = 7 ,[20,21] = 21 -20 + 21 - 21 = 1. Total difference = 12. In the other possible case [1,3,4] = 4 - 1 + 4 - 3 + 4 - 4 = 4,[11,12,19] = 19 - 11 + 19 - 12 + 19 - 19 = 12,[20,21] = 21 - 20 + 21 - 21 = 0. Total difference = 16. This is calculation of over performance. This is due to the fact that larges number (representing for example strength) need to replace smallest number in the group (weakest). Using super strong part would be too expensive or heavy so optimization is needed.

So first I was thinking to slice the list in all possible combinations and then calculate the "difference between maximum number in the group and all others in the group". Then select as a final result the one with smallest minimum difference.

I was wondering if there is some build in function in python or Spyder or similar. If I need to write a code could you help me?

I'm trying to work on random list divided in to 10 in order to reapply it in different situations. l = sorted(random.sample(range(100), 10)).

3条回答
小情绪 Triste *
2楼-- · 2019-09-06 21:04

As you don't mention whats the logic behind your slicing for beginning i suggest this function :

>>> def slicer(l,n):
...  le=len(l)
...  S=int(np.around(float(le)/n))
...  return [l[i:i+S] for i in range(0,le,S)]
... 
>>> slicer([1,3,4,11,12,19,20,21],2)
[[1, 3, 4, 11], [12, 19, 20, 21]]
>>> slicer([1,3,4,11,12,19,20,21],3)
[[1, 3, 4], [11, 12, 19], [20, 21]]
>>> slicer([1,3,4,11,12,19,20,21],4)
[[1, 3], [4, 11], [12, 19], [20, 21]]

Here i use numpy.around to round the float(le)/n for got a true slicing !

查看更多
成全新的幸福
3楼-- · 2019-09-06 21:23

Edit: based on clarified question, here is another algorithm. I still kept the original response below in case it's relevant.

You can solve the problem using dynamic programming. Note that the code below is not optimized for speed, because I thought that would make it too hard to understand. If you implement it carefully, you can do it in O(N * K), where N is the length of a and K is the number of sets to partition into.

a = [1,3,4,11,12,19,20,21]
S = []
K = 3

# memoize results in (len(a) + 1) by K array                                                                                                                             
memo_partitions = [[None for j in xrange(len(a) + 1)] for i in xrange(K + 1)]

def compute_cost(arr):
    # this is the objective to be minimized                                                                                                                              
    if len(arr) == 0:
        return 0
    return sum(arr[-1] - x for x in arr)

def compute_best_partition(k, n):
    # computes the best partition of the first `n` elements of `a`                                                                                                       
    # into `k` parts                                                                                                                                                     
    if n == 0:
        return [[] for _ in xrange(k)], 0
    if k == 1:
        return [a[:n]], compute_cost(a[:n])

    if memo_partitions[k][n] is not None:
        return memo_partitions[k][n]

    best_partition = [[] for _ in xrange(k - 1)] + [a[:n]]
    best_cost = compute_cost(a[:n])
    for i in xrange(1, n):
        last_group = a[i:n]
        additional_cost = compute_cost(last_group)
        partition, cost = compute_best_partition(k - 1, i)

        if cost + additional_cost < best_cost:
            best_partition = partition[:]
            best_partition.append(last_group)
            best_cost = cost + additional_cost

    memo_partitions[k][n] = (best_partition, best_cost)
    return memo_partitions[k][n]

best_partition, cost = compute_best_partition(K, len(a))
print best_partition

Original response below.

Here are two approaches that might do what you want. Suppose your numbers are, in ascending order,

a[0], a[1], ... , a[n - 1]

Let max_diff(S) denote the maximum difference between two elements of a set S. We want to split up the numbers into sets S[0], ... , S[k - 1] such that the max_diff(S[i]) are small.

First, suppose we are trying to minimize the sum of the max_diff(S[i]). Notice that the sum of the max_diff(S[i]) is just a[n - 1] - a[0] minus the "gaps" between the S[i]. Thus, you can just find the k - 1 largest of the a[i + 1] - a[i] and exclude those. In python code,

a = [1,3,4,11,12,19,20,21]
S = []
k = 3

diffs = [(a[i + 1] - a[i], i) for i in xrange(len(a) - 1)]
diffs.sort()
best_cuts = [i for diff, i in diffs[-k:]]
best_cuts.sort()

last_cut = 0
for cut in best_cuts:
    S.append(a[last_cut:cut + 1])
    last_cut = cut + 1
S.append(a[last_cut:])
print S

Alternatively, suppose we are trying to minimize the maximum of the max_diff(S[i]). Then, we can do a binary search on the achievable value. In code,

a = [1,3,4,11,12,19,20,21]
S = []
k = 3

best_partition = None
low, high = 0, max(a)
while low < high:
    mid = (low + high) / 2

    # try to get all max_diffs <= mid                                                                                                                                    
    full_partition = []
    last_set = [a[0]]
    for val in a[1:]:
        if val > last_set[0] + mid:
            full_partition.append(last_set)
            last_set = [val]
        else:
            last_set.append(val)
    full_partition.append(last_set)

    if len(full_partition) > k:
        low = mid + 1
    else:
        high = mid
        best_partition = full_partition

S = best_partition
print S
查看更多
淡お忘
4楼-- · 2019-09-06 21:24

Based on your updated comments, it sounds like you are looking for the K-Means algorithm, or similar things, that will cluster your list elements into distinct groups based on their distance from proposed centers (this is what your difference calculation is really measuring).

In your criterion, note that it never makes sense to subtract the max of each subgroup from itself, since this is always zero by definition. So really you're looking at the sum of the max minus each element, over all non-max elements (what to do with duplicates is also a question you need to answer). K-Means will do something different (it will look at every point's distance from the average of the points), but in spirit it's the same. You can modify k-means to use your notion of a group score, although I don't really see any benefit to that in terms of the clustering output -- I'd need to see some kind of math proofs about the limiting behavior of the different criteria to be convinced that it matters.

You can achieve this reasonably easily with the sklearn and numpy modules:

from sklearn import cluster as cluster
import numpy as np

km = cluster.KMeans(n_clusters=4)
example_data = np.asarray([1,2,3, 11,12, 20,21,22, 30,35])[:,None]

km.fit(example_data)

Then look at km.labels_:

In [65]: km.labels_
Out[65]: array([0, 0, 0, 3, 3, 1, 1, 1, 2, 2], dtype=int32)

You can see that this would put together [1,2,3], [11, 12], [20, 21 , 22], [30, 35]. Below is some code that actually gets this for you:

In [74]: example_data.tolist()[0]
Out[74]: [1, 2, 3, 11, 12, 20, 21, 22, 30, 35]

In [75]: [[x for i,x in enumerate(example_data.tolist()[0]) if km.labels_[i] == j] 
          for j in range(km.n_clusters)]

Out[75]: [[1, 2, 3], [20, 21, 22], [30, 35], [11, 12]]

But note that this is not perfect: it is an iterative method not guaranteed to converge to any "true" solution, and for bizarre enough input data, you can get bizarre output.

Alternatively, a more basic understanding of what you want is to choose index integers i[0] through i[k], such that

sub_lists[j] = original_list[i[j]:i[j+1]] 

with i[0]=0 and i[k+1] understood to mean "everything else in the list." Then define:

sub_lens = [len(s) for s in sub_lists]
max_len  = max(sub_lens)
criterion(k, i[0], ..., i[k]) = max(max_len - s_len for s_len in sub_lens)

So a solution for you is a tuple of parameters, (k, i[0], ..., i[k]) and you want the choice that minimized the above expression criterion.

A generic solution for this problem is quite complicated. But if you're willing to accept a greedy solution that will be very balanced except for the final sublist, many of these solutions will do.

查看更多
登录 后发表回答