可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Given an array (lets assume of non negative integers) we are required to find the smallest length subset such that sum of elements is no less than K. K is another integer provided as input.

Is it possible to have a solution with time complexity O(n) [big oh of n]?

my current thinking is along the following lines: we could sort the array in O(n * log n) and then iterate over the sorted array starting from the largest number and maintaining the running sum until the running sum becomes >= K.

However, this would have worst case running time of O(n * (log n + 1)).

So if anyone could share ideas of doing this in O(n) time, I would really appreciate..

Note: The elements of subarray dont have to be a contiguous sequence of the original array in this context

回答1:

There is a linear time algorithm for finding the K largest numbers - http://en.wikipedia.org/wiki/Selection_algorithm. Of course what you want is just enough largest numbers to sum up to at least K.

In the standard selection algorithm, you take a random pivot and then look to see how many numbers fall on each side of it. You then either accept or reject one half, and keep working on the other half. You have just looked at each number in each half, in turn - the cost of each pivot stage is linear, but the amount of data considered at each stage reduces fast enough that the total cost is still only linear.

The cost of a pivot stage will still be only linear if you take the sum of all the numbers above the pivot. Using this, you can work out if accepting all these numbers, together with any numbers previously selected, would give you a collection of numbers that add up to at least K. If it does, you can ditch the other numbers and use the numbers above the pivot for the next pass. If it does not, you can accept all the numbers above the pivot, and use the numbers below the pivot for the next pass. As with the selection algorithm, the pivot itself and any ties give you a few special cases and the possibility of finding an exact answer early.

(So I think you can do this in (randomized) linear time using a modified version of the selection algorithm in which you look at the sum of the numbers above the pivot, instead of how many numbers are above the pivot.

回答2:

This seems to be a problem for dynamic programming. When you build your array, you build another array containing the cumulative sum up to each particular index. So each i in that array has the sums from 1..i.

Now it's easy to see that the sum of values for indices p..q is SUM(q) - SUM(p-1) (with the special case that SUM(0) is 0). Obviously I'm using 1-based indices here... This operation is O(1), so now you just need an O(n) algorithm to find the best one.

A simple solution is to keep track of a p and q and walk these through the array. You expand with q to begin. Then you contract p and expand q repeatedly, like a caterpillar crawling through your array.

To expand q:

p <- 1
q <- 1

while SUM(q) - SUM(p-1) < K
    q <- q + 1
end while

Now q is at the position where the subarray sum has just exceeded (or is equal to) K. The length of the subarray is q - p + 1.

After the q loop you test whether the subarray length is less than your current best. Then you advance p by one step (so that you don't accidentally skip over the optimal solution) and go again.

You don't really need to create the SUM array... You can just build the subarray sum as you go... You would need to go back to using the 'real' p instead of the one just before.

subsum <- VAL(1)
p <- 1
q <- 1

while q <= N
    -- Expand
    while q < N and subsum < K
        q <- q + 1
        subsum <- subsum + VAL(q)
    end while

    -- Check the length against our current best
    len <- q - p + 1
    if len < bestlen
        ...
    end if

    -- Contract
    subsum <- subsum - VAL(p)
    p <- p + 1
end while

Notes:

j_random_hacker said: it would help to explain exactly why it is acceptable to examine just the O(n) distinct subarrays that this algorithm examines, instead of all O(n^2) possible distinct subarrays

The dynamic programming philosophy is:

do not follow solution paths that will lead to a non-optimal result; and
use the knowledge of previous solutions to compute a new solution.

In this case a single solution candidate (some (p,q) such that p <= q) is computed by summing of the elements. Because those elements are positive integers, we know that for any solution candidate (p,q), the solution candidate (p,q+1) will be larger.

And so we know that if (p,q) is a minimal solution then (p,q+1) is not. We end our search as soon as we have a candidate, and test whether that candidate is better than any we have seen so far. That means for each p, we only need to test one candidate. This leads to both p and q only ever increasing, and thus the search is linear.

The other part of this (using previous solutions) comes from recognising that sum(p,q+1) = sum(p,q) + X(q+1) and similarly sum(p+1,q) = sum(p,q) - X(p). Therefore, we do not have to sum all elements between p and q at every step. We only have to add or subtract one value whenever we advance one of the search pointers.

Hope that helps.

回答3:

The OP made it clear in his answers to comments, that the problem is to find a subset, NOT necessarily a contiguous sequence (the term 'subarray' was admittedly poor). Then, I believe the method indicated by mcdowella is correct, comprising the following steps:

Starting out with N elements, find the MEDIAN element (i.e. the (N/2)-th element imagining a sorted array, which you don't have and don't construct). This is achieved with the "Median of Medians" algorithm, proven to be O(n), see the wiki ref already given and repeated here: Selection algorithm, see section on the Median of Median algorithm

Having the median element: scan linearly the complete set, and partition in "below" and "above", meanwhile summing, counting and doing whatever you want to keep track of, for each of the "halves". This step is (also) O(N).

Having completed the scan, if the "upper half"-sum is above the target (K), you forget all about the lower half and repeat the procedure for the upper half, whose size is (roughly) N/2. If, on the other hand, the "upper half'-sum is less than K, then you add that upper half to the final result, subtract its sum from K and repeat the procedure with the lower half.

Altogether, you process sets of size N, N/2, N/4, N/8 etcetera, each in O(M) with respect to their respective sizes M, and hence the overall stuff is also linear in N, because N + N/2 + N/4 + N/8 ... stays below 2N.

回答4:

Here is a solution that should be fast enough. I'm guessing it's almost linear.

def solve(A, k):
    assert sum(A) >= k
    max_ = max(A)
    min_ = min(A)
    n = len(A)
    if sum(A) - min_ < k:
        return A
    bucket_size = (max_ - min_)/n + 1
    buckets = []
    for i in range(n):
        buckets.append([])
    for item in A:
        bucket = (item - min_)/bucket_size
        buckets[bucket].append(item)

    solution = []

    while True:
        bucket = buckets.pop() #the last bucket
        sum_ = sum(bucket)
        if sum_ >= k:
            #don't need everything from this bucket
            return solution + solve(bucket, k)
        else:
            k -= sum_
            solution += bucket

print solve([5,2,7,52,30,12,18], 100)
"[52, 30, 18]"

回答5:

I believe that "sub array" term implies contiguous part of array (like here, another problem as example).

So there is simple O(n) algorithm to find minimum length subarray:

Set two indexes (left, right) to the first element and move them till the end of array. Check the sum between these indexes, advance right pointer, if sum is too small (or pointers are equal), advance left if sum is big

回答6:

The subarray has to be contiguous by the definition of array.

Use 2 pointers (start, end). Initialize them to the beginning of array. Track the current sum between (start, end), and move end to right one by one. Every time you move end pointer, sum = sum + array[end].

And when sum >= target, start moving start to the right and keep tracking sum as sum = sum - array[start].

While moving start to the right, keep checking the sum still is no less than target. And we also need to track the length by doing length = end - start + 1, as well as minLength = min(minLength, length).

Now when we have moved both pointers to as right as possible, we just need to return minLength.

The general idea is to first find an "window" that satisfies the condition (sum >= target), then slide the window to right one element at a time and keep the window size minimum every time we move the window.