I have a problem related to the subset sum problem and am wondering if the differences make it easier, i.e. solvable in a reasonable amount of time.
Given a value V, a set size L, and a sequence of numbers [1,N] S, how many size L subsets of S sum to less than V?
This is different than the subset sum problem in three ways:
- I care how many subsets are less than a given value, not how many are equal.
- The subset sizes are fixed.
- I care how many sets sum to less than V, not just whether any exist.
Is there any reasonably efficient algorithm to solve this?
Edit:
Obviously, this can be done in O(N choose L) using a combination generating algorithm. What I'm really interested in is clever hacks to significantly speed it up past that.
(The decision version of) your problem is still NP-complete. The idea is that if we could solve your problem, then (for each subset size, say) we could ask how many sets sum to less than V and how many sum to less than V-1, and the difference of those two numbers would tell us whether are subsets that sum to exactly V -- thus we could solve the subset sum problem. [This is not a complete proof, because it's a Turing reduction, not a many one reduction.]
However, there is a simple dynamic programming solution that runs in time O(nLV). [The reason this does not prove that P=NP is that V could be exponential in the input size: with n bits, you can represent values upto 2n. But assuming that your V is not exponential, this is not a problem.]
Let num[v][k][i] denote the number of size-k subsets of the first i elements of S that sum to v. You can calculate them as (for each i):
num[0][0][i] = 1
for v = 1 to V:
for k = 1 to L:
num[v][k][i] = num[v][k][i-1] + num[v-S[i]][k-1][i-1]
where S[i] is the ith element in your sequence. (Any set of size k that sums to v either doesn't use S[i], so it's counted in num[v][k][i-1], or it uses S[i] which means that the rest of the subset has k-1 elements, uses only the first i-1 numbers in the sequence, and sums to v-S[i].) Finally, count num[v][L][|S|] for each v less than V; that's your answer.
Also, you can omit the third subscript if you do it carefully (run your loop downwards for each i, etc.); I only included it for clarity.
I'm not prepared to present a proof, but that sounds like it might be amenable to a dynamic programming scheme: tabulate the list of subsets of size 2use them to computer subsets of size 3, etc, so that hyou only need to examine a small collection of prospects.
One optimization that comes to mind is this: Order your sequence (if it alerady isn't so). Pick the first L-1 items from the start of it, and then pick the last item such, that it's the largest possible value (the next largest value in the sequence would give a sum too big). Discard the rest of the sequence, because those items can never be a part of a valid subset anyway.
After that I guess it's full search again. But then again there might be other optimiziations possible too.
Isn't this just the Knapsack problem with a twist? Could be I'm wrong though.
The dynamic programming solution to the subset sum problem generates a table that contains this answer (ie a boolean table of V by N where V is the max number of elements and N is the max number of items that can be in a set that satisifies the constraints; each boolean being true if <=N elements sum to <=V). So if N * V is not too large for you, an acceptably fast algorithm exists. The subset sum solution is just the highest set element in this table for which the number of elements is <= N/2.
If it's only positive integers, you can do a verification step if you need;
Take the sum of the L-1 smallest integers in the set. If that's a sum X, then n-X must be below the largest element if the problem is supposed to have a solution. Come to think of it, you can eliminate other L this way...
Well, for one thing since you're specifying size=L then even if you can't think of anything clever and just use brute force you'll have (N choose L) separate sums in the worst case, so it's a bit better than n^^L (well, L+1, as you'd then sum each subset).
This sounds like an n choose k category of problem. Generating k-subsets of n is covered in Skiena's Algorithm Design Manual, and the book suggests enumerating relevant subsets in lexicographic order (recursively, for example). Then do your sum and comparison on each subset.
If you have a sorted set, you could presumably prune impossible solutions from the solution space.
Perhaps the dynamic programming formulation is amenamble to a PTAS of FPTAS.