I have series of numbers like this
myvar = [57, 71, 87, 97, 99, 101, 103, 113, 114, 115, 128, 129, 131, 137, 147, 156, 163, 186]
Now I want to calculate all such possible combinations (of length 1 to 20) whose sum is equal to a given number m
.
I tried to solve with following code as :
def sum_count(m): ## Where m is the sum required
from itertools import combinations
myseq = []
for i in range(1,len(myvar)):
mycomb = list(combinations(mass,i)); # Getting combinations of length i
mycomb = [list(j) for j in mycomb];
for j in range(len(mycomb)-1,-1,-1):
if sum(mycomb[j]) == m:
myseq.append(mycomb[j])
return(myseq)
When I put m = 270
(for example) it gives me :
[[114, 156], [57, 99, 114]]
But is quite evident from the myvar
that there are still other combinations which have a sum equal to 270. Where am I failing to comprehend.
TL;DR:
Discuss different methods, best method is listed here for ease of access and was originally written by thefourtheye:
def subsets_with_sum(lst, target, with_replacement=False):
x = 0 if with_replacement else 1
def _a(idx, l, r, t):
if t == sum(l): r.append(l)
elif t < sum(l): return
for u in range(idx, len(lst)):
_a(u + x, l + [lst[u]], r, t)
return r
return _a(0, [], [], target)
note: the above method is modified with improvements from the original version below
Original Post:
Well - A quick and simple application of your data with some logic concludes that you have the correct answer:
# data
vals = [57, 71, 87, 97, 99, 101, 103, 113, 114, 115, 128, 129, 131, 137, 147, 156, 163, 186]
target = 270
Using itertools.combinations
:
>>> from itertools import combinations
>>> [comb for i in range(1, 20) for comb in combinations(vals, i) if sum(comb) == target]
[(114, 156), (57, 99, 114)]
However, maybe you wanted to use combinations_with_replacement
which lets values be used multiple times from the initial list as opposed to only once.
Using itertools.combinations_with_replacement
:
>>> from itertools import combinations_with_replacement
>>> [comb for i in range(1, 20) for comb in combinations_with_replacement(vals, i) if sum(comb) == target]
>>> # result takes too long ...
You can make it into a robust function:
def subsets_with_sum(lst, target, subset_lengths=range(1, 20), method='combinations'):
import itertools
return [comb for i in subset_lengths for comb in
getattr(itertools, method)(lst, i) if sum(comb) == target]
>>> subsets_with_sum(vals , 270)
[(114, 156), (57, 99, 114)]
Another method, provided by thefourtheye , it is much faster, and requires no imports:
def a(lst, target, with_replacement=False):
def _a(idx, l, r, t, w):
if t == sum(l): r.append(l)
elif t < sum(l): return
for u in range(idx, len(lst)):
_a(u if w else (u + 1), l + [lst[u]], r, t, w)
return r
return _a(0, [], [], target, with_replacement)
>>> s = [57, 71, 87, 97, 99, 101, 103, 113, 114, 115, 128, 129, 131, 137, 147, 156, 163, 186]
>>> a(s, 270)
[[57, 99, 114], [114, 156]]
>>> a(s, 270, True)
[[57, 57, 57, 99], [57, 57, 156], [57, 71, 71, 71], [57, 99, 114], [71, 71, 128], [114, 156]]
Timing:
def a(lst, target, with_replacement=False):
def _a(idx, l, r, t, w):
if t == sum(l): r.append(l)
elif t < sum(l): return
for u in range(idx, len(lst)):
_a(u if w else (u + 1), l + [lst[u]], r, t, w)
return r
return _a(0, [], [], target, with_replacement)
def b(lst, target, subset_lengths=range(1, 21), method='combinations'):
import itertools
return [comb for i in subset_lengths for comb in
getattr(itertools, method)(lst, i) if sum(comb) == target]
vals = [57, 71, 87, 97, 99, 101, 103, 113, 114, 115, 128, 129, 131, 137, 147, 156, 163, 186]
from timeit import timeit
print 'no replacement'
print timeit("a(vals, 270)", "from __main__ import vals, a", number=10)
print timeit("b(vals, 270)", "from __main__ import vals, b", number=10)
print 'with replacement'
print timeit("a(vals, 270, True)", "from __main__ import vals, a", number=10)
print timeit("b(vals, 270, method='combinations_with_replacement')", "from __main__ import vals, b", number=10)
Timing Output:
no replacement
0.0273933852733
0.683039054001
with replacement
0.0177899423427
... waited a long time ... no results ...
conclusion:
The new method (a) is at least 20 times faster.
If you simply want a count of the number of combinations then use this:
aminoacid_masses = [57, 71, 87, 97, 99, 101, 103, 113, 114, 115, 128, 129, 131, 137, 147, 156, 163, 186]
def peptides(n, d):
for m in aminoacid_masses:
if n-m in d:
d[n] = d.get(n,0)+d[n-m]
return d
def pep_counter(M):
dicc = {0:1}
mn = min(aminoacid_masses)
for i in range(M-mn+1):
j = i+mn
peptides(j,dicc)
return dicc
# This line calls the routine and indexes the returned dict. Both with the desired mass (the mass we want peptides to sum up to)
print(pep_counter(1024)[1024])