Suppose you need n_1 in [10,30], n_2 in [20,40], n_3 in [30,50] and n1+n2+n3=90
If you need each possible triplet (n_1, n_2, n_3) to be equally-likely, that's going to be difficult. The number of triples of the form (20, n_2, n_3) is greater than the number of triples (10, n_2, n_3), so you can't just pick n_1 uniformly.
The incredibly slow but accurate way is to generate the all 5 randoms in the correct ranges and reject the whole group if the sum is not correct.
. . . AHA!
I found a way to parametrize the choice effectively. First, though, for simplicity note that the sum of the low bounds is the minimum possible sum. If subtract the sum of the low bounds from the target number and subtract the low bound from each generated number, you get a problem where each number is in the interval [0, max_k-min_k]. That simplifies the math and array (list) handling. Let n_k be the 0-based choice with 0<=n_k<=max_k-min_k.
The order of the sums is lexicographic, with all sums beginning with n_1=0 (if any) first, then n_1==1 sums, etc. Sums are sorted by n_2 in each of those groups, then by n_3, and so on. If you know how many sums add to the target (call that T), and how many sums start with n_1=0, 1, 2, ... then you can find the starting number n1 of sum number S in in that list. Then you can reduce the problem to adding n_2+n_3+... to get T-n_1, finding sum number S - (number original sums starting with number less than n_1).
Let pulse(n) be a list of n+1 ones: (n+1)*[1] in Python terms. Let max_k,min_k be the limits for the k'th choice, and m_k = max_k-min_k be the upper limit for 0-based choices. Then there are 1+m_1 different "sums" from the choice of the first number, and pulse(m_k) gives the distribution: 1 was to make each sum from 0 to m_1. For the first two choices, there are m_1+m_+1 different sums. It turns out that the convolution of pulse(m_1) with pulse(m_2) gives the distribution.
Time to stop for some code:
def pulse(width, value=1):
''' Returns a vector of (width+1) integer ones. '''
return (width+1)*[value]
def stepconv(vector, width):
''' Computes the discrete convolution of vector with a "unit"
pulse of given width.
Formula: result[i] = Sum[j=0 to width] 1*vector[i-j]
Where 0 <= i <= len(vector)+width-1, and the "1*" is the value
of the implied unit pulse function: pulse[j] = 1 for 0<=j<=width.
'''
result = width*[0] + vector;
for i in range(len(vector)):
result[i] = sum(result[i:i+width+1])
for i in range(len(vector), len(result)):
result[i] = sum(result[i:])
return result
That's coded specifically for only doing convolutions with a "pulse" array, so every linear combination in the convolution is just a sum.
Those are used only in the constructor of the final class solution:
class ConstrainedRandom(object):
def __init__(self, ranges=None, target=None, seed=None):
self._rand = random.Random(seed)
if ranges != None: self.setrange(ranges)
if target != None: self.settarget(target)
def setrange(self, ranges):
self._ranges = ranges
self._nranges = len(self._ranges)
self._nmin, self._nmax = zip(*self._ranges)
self._minsum = sum(self._nmin)
self._maxsum = sum(self._nmax)
self._zmax = [y-x for x,y in self._ranges]
self._rconv = self._nranges * [None]
self._rconv[-1] = pulse(self._zmax[-1])
for k in range(self._nranges-1, 0, -1):
self._rconv[k-1] = stepconv(self._rconv[k], self._zmax[k-1])
def settarget(self, target):
self._target = target
def next(self, target=None):
k = target if target != None else self._target
k = k - self._minsum;
N = self._rconv[0][k]
seq = self._rand.randint(0,N-1)
result = self._nranges*[0]
for i in range(len(result)-1):
cv = self._rconv[i+1]
r_i = 0
while k >= len(cv):
r_i += 1
k -= 1
while cv[k] <= seq:
seq -= cv[k]
r_i += 1
k -= 1
result[i] = r_i
result[-1] = k # t
return [x+y for x,y in zip(result, self._nmin)]
# end clss ConstrainedRandom
Use that with:
ranges = [(low, high), (low, high), ...]
cr = ConstrainedRandom(ranges, target)
seq = cr.next();
print(seq)
assert sum(seq)==target
seq = cr.next(); # get then get the next one.
...etc. The class could be trimmed down a bit, but the main space overhead is in the _rconv list, which has the stored convolutions. That's roughly N*T/2, for O(NT) storage.
The convolutions only use the ranges, with a lot of randoms generated with the same constraints, the table construction time "amortizes away" to zero. The time complexity of .next() is roughly T/2 on average and O(T), in terms of the number of indexes into the _rconv lists.
To see how the algorithm works, assume a sequence of 3 zero-based choices, with max values (5,7,3), and a 0-based target T=10. Define or import the pulse and stepconv functions in an Idle session, then:
>>> pulse(5)
[1, 1, 1, 1, 1, 1]
>>> K1 = pulse (5)
>>> K2 = stepconv(K1, 7)
>>> K3 = stepconv(K2, 3)
>>> K1
[1, 1, 1, 1, 1, 1]
>>> K2
[1, 2, 3, 4, 5, 6, 6, 6, 5, 4, 3, 2, 1]
>>> K3
[1, 3, 6, 10, 14, 18, 21, 23, 23, 21, 18, 14, 10, 6, 3, 1]
>>> K3[10]
18
>>> sum(K3)
192
>>> (5+1)*(7+1)*(3+1)
192
K3[i] shows the number of different choice n_1, n_2, n_3 such that 0 <= n_k <= m_k and Σ n_k = i. Letting * mean convolution when applied to two of these lists. Then pulse(m_2)*pulse(m_3) is gives the distribution of sums of n_2 and n_3:
>>> R23 = stepconv(pulse(7),3)
>>> R23
[1, 2, 3, 4, 4, 4, 4, 4, 3, 2, 1]
>>> len(R23)
11
Every value from 0 to T=10 is (barely) possible, so any choice is possible for the first number and there are R23[T-n_1] possible triplets adding to T=10 that start with N1. So, once you've found that there are 18 possible sums adding to 10, generate a random number S = randint(18) and count down through the R23[T:T-m_1-1:-1] array:
>>> R23[10:10-5-1:-1]
[1, 2, 3, 4, 4, 4]
>>> sum(R23[10:10-5-1:-1])
18
Note the sum of that list is the total computed in K3[10] above. A sanity check. Anyway, if S==9 was the random choice, then find how many leading terms of that array can be summed without exceeding S. That's the value of n_1. In this case 1+2+3 <= S but 1+2+3+4 > S, so n_1 is 3.
As described above, you can then reduce the problem to find n_2. The final number (n_3 in this example) will be uniquely determined.
One could count the number of ways to make each possible total using two spans, four spans, eight spans, and so forth (where a span is a range of integers including its endpoints). With those numbers tabulated, you can work backwards toward a sample. For example, suppose there are 16 spans, each including numbers 1 to 9. One finds there are w = 202752772954792 ways to make a total of t = 100
. Choose a random number r in the range 1 to w. Search (or lookup) to find a number J such that the sum with J>j of leftways(t-j)*rightways(j))
exceeds r and the sum with J>j+1 does not, where leftways(i)
is the number of ways of making i using the first 8 spans, and rightways(j)
is the number of ways of making j using the last 8 spans. Recurse using i = t-j with the first 8 spans and j with the last 8, etc. Base cases occur whenever there is only one way of making a required total.
The code below can be revised to run more efficiently by sorting the spans into descending order by width (that is, list the widest spans first) and removing some swaps. Note that if spans are not in descending order by width, the result vector will not be in the same order as the spans. Also, it might be possible to replace the linear for j ...
search in findTarget
by a binary search but I haven't figured out how to do so without using quite a bit more space.
The code could be made cleaner and more clear by using objects to store the tree structures, instead of just tuples.
Space used probably is O(s²·(lg m))
if there are m spans whose maxima total up to s. Time used is O(s²·(lg m))
for the initial tabulation of sums of products and probably O(m+(lg m)·(s/m))
or O(m+(lg m)·s)
for each sample. (I haven't properly analyzed space and time requirements.) On a 2GHz machine, the code as shown produces about 8000 samples per second if there are 16 identical spans 1...10; about 5000 samples per second if there are 32 identical spans 1...3; and about 2000 samples per second if there are 32 identical spans 1...30. Some sample outputs for various target totals and sets of spans are shown after the code.
from random import randrange
def randx(hi): # Return a random integer from 1 to hi
return 1+randrange(hi) if hi>0 else 0
# Compute and return c with each cell k set equal to
# sum of products a[k-j] * b[j], taken over all relevant j
def sumprods(lt, rt):
a, b = lt[0], rt[0]
(la,ma,aa), (lb,mb,bb) = a, b
if ma-la < mb-lb: # Swap so |A| >= |B|
la, ma, aa, lb, mb, bb = lb, mb, bb, la, ma, aa
lc, mc = la+lb, ma+mb
counts = [0]*(mc+1)
for k in range(lc,mc+1):
for j in range(max(lb,k-ma), min(mb,k-la)+1):
counts[k] += aa[k-j] * bb[j]
return (lc, mc, counts)
def maketree(v):
lv = len(v)
if lv<2:
return (v[0], None, None)
ltree = maketree(v[:lv/2])
rtree = maketree(v[lv/2:])
return (sumprods(ltree, rtree), ltree, rtree)
def findTarget(tototal, target, tree):
global result
lt, rt = tree[1], tree[2]
# Put smaller-range tree second
if lt[0][1]-lt[0][0] < rt[0][1]-rt[0][0]: lt, rt = rt, lt
(la,ma,aa), (lb,mb,bb) = lt[0], rt[0]
lc, mc = la+lb, ma+mb
count = 0
for j in range(max(lb,tototal-ma), min(mb,tototal-la)+1):
i = tototal-j
count += aa[i] * bb[j]
if count >= target: break
# Suppose that any way of getting i in left tree is ok
if lt[1]: findTarget(i, randx(aa[i]), lt)
else: result += [i]
# Suppose that any way of getting j in right tree is ok
if rt[1]: findTarget(j, randx(bb[j]), rt)
else: result += [j]
spans, ttotal, tries = [(1,6), (5,11), (2,9), (3,7), (4,9), (4,12), (5,8),
(3,5), (2,9), (3,11), (3,9), (4,5), (5,9), (6,13),
(7,8), (4,11)], 100, 10
spans, ttotal, tries = [(10*i,10*i+9) for i in range(16)], 1300, 10000
spans, ttotal, tries = [(1,3) for i in range(32)], 64, 10000
spans, ttotal, tries = [(1,10) for i in range(16)], 100, 10
print 'spans=', spans
vals = [(p, q, [int(i>=p) for i in range(q+1)]) for (p,q) in spans]
tree = maketree(vals)
nways = tree[0][2][ttotal]
print 'nways({}) = {}'.format(ttotal, nways)
for i in range(1,tries):
result = []
findTarget(ttotal, randx(nways), tree)
print '{:2}: {} {}'.format(i, sum(result), result)
In the output samples shown below, the lines with colons contain a sample number, a sample-total, and a sample-values array. Other lines show the set of spans and the number of ways of making a desired total.
spans= [(1, 10), (1, 10), (1, 10), (1, 10), (1, 10), (1, 10), (1, 10), (1, 10), (1, 10), (1, 10), (1, 10), (1, 10), (1, 10), (1, 10), (1, 10), (1, 10)]
nways(100) = 202752772954792
1: 100 [8, 9, 1, 2, 8, 1, 10, 6, 6, 3, 6, 10, 6, 10, 10, 4]
2: 100 [2, 6, 10, 3, 1, 10, 9, 5, 10, 6, 2, 10, 9, 7, 4, 6]
3: 100 [1, 6, 5, 1, 9, 10, 10, 7, 10, 2, 8, 9, 10, 9, 2, 1]
4: 100 [10, 7, 6, 3, 7, 8, 6, 5, 7, 7, 5, 1, 9, 6, 9, 4]
5: 100 [7, 1, 10, 5, 5, 4, 9, 5, 3, 9, 2, 8, 6, 8, 10, 8]
spans= [(1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3), (1, 3)]
nways(64) = 159114492071763
1: 64 [2, 2, 1, 3, 1, 2, 2, 2, 1, 2, 3, 3, 3, 2, 2, 1, 2, 3, 1, 3, 1, 3, 2, 1, 2, 3, 2, 2, 1, 2, 2, 2]
2: 64 [1, 2, 1, 1, 1, 3, 3, 3, 2, 1, 1, 2, 3, 2, 2, 3, 3, 3, 1, 2, 1, 2, 2, 3, 2, 2, 1, 3, 1, 3, 2, 2]
3: 64 [2, 1, 3, 2, 3, 3, 1, 3, 1, 3, 2, 2, 1, 2, 1, 3, 1, 3, 1, 2, 2, 2, 2, 1, 1, 3, 2, 2, 3, 2, 3, 1]
4: 64 [2, 3, 3, 2, 3, 2, 1, 3, 2, 2, 1, 2, 1, 1, 3, 2, 2, 3, 3, 1, 1, 2, 2, 1, 1, 2, 3, 3, 2, 1, 1, 3]
5: 64 [1, 1, 1, 3, 2, 2, 3, 2, 2, 1, 2, 2, 1, 2, 1, 1, 3, 3, 2, 3, 1, 2, 2, 3, 3, 3, 2, 2, 1, 3, 3, 1]
spans= [(0, 9), (10, 19), (20, 29), (30, 39), (40, 49), (50, 59), (60, 69), (70, 79), (80, 89), (90, 99), (100, 109), (110, 119), (120, 129), (130, 139), (140, 149), (150, 159)]
nways(1323) = 5444285920
1: 1323 [8, 19, 25, 35, 49, 59, 69, 76, 85, 99, 108, 119, 129, 139, 148, 156]
2: 1323 [8, 16, 29, 39, 48, 59, 69, 77, 88, 95, 109, 119, 129, 138, 147, 153]
3: 1323 [9, 16, 28, 39, 49, 58, 69, 79, 87, 96, 106, 115, 128, 138, 149, 157]
4: 1323 [8, 17, 29, 36, 45, 58, 69, 78, 89, 99, 106, 119, 128, 135, 149, 158]
5: 1323 [9, 16, 27, 34, 48, 57, 69, 79, 88, 99, 109, 119, 128, 139, 144, 158]
spans= [(1, 30), (1, 30), (1, 30), (1, 30), (1, 30), (1, 30), (1, 30), (1, 30), (1, 30), (
1, 30), (1, 30), (1, 30), (1, 30), (1, 30), (1, 30), (1, 30), (1, 30), (1, 30), (1, 30), (
1, 30), (1, 30), (1, 30), (1, 30), (1, 30), (1, 30), (1, 30), (1, 30), (1, 30), (1, 30), (
1, 30), (1, 30), (1, 30)]
nways(640) = 19144856039395888221416547336829636235610525
1: 640 [7, 24, 27, 9, 30, 23, 30, 27, 28, 29, 2, 30, 28, 19, 7, 27, 10, 2, 21, 23, 24, 2
7, 24, 16, 29, 8, 13, 23, 2, 19, 27, 25]
2: 640 [30, 2, 17, 28, 30, 16, 5, 1, 26, 24, 22, 19, 26, 16, 16, 30, 27, 15, 19, 30, 15,
30, 22, 5, 30, 9, 13, 25, 19, 15, 30, 28]
3: 640 [2, 24, 1, 23, 20, 5, 30, 22, 24, 19, 22, 9, 28, 29, 5, 24, 14, 30, 24, 16, 26, 2
1, 26, 20, 20, 19, 24, 29, 24, 8, 23, 29]
4: 640 [7, 20, 16, 24, 22, 14, 28, 28, 26, 8, 21, 9, 22, 24, 28, 19, 5, 13, 9, 24, 25, 2
2, 29, 18, 20, 21, 17, 26, 30, 9, 26, 30]