Create random sequence of integers within a range

2019-07-13 08:40发布

问题:

What is the fastest way to generate a specific nr of integers of random value uniformly distributed within a specific range and with a minimum distance between each element?

For example, given a sequence range between 0 and 20, we want to create 5 elements with at least 3 points distance between each element, the result could be something like this [0,5,11,14,19] or [2,5,9,13,18]

I created a loop that achieves this but it is very slow when i want to create ranges in the order of millions.

回答1:

How about the following recipe: If you want a gap of 3 between your 5 adjacent elements, but want a total range of 20, then you effectively have 20 - (5-1)*3 steps of "slack" that you can randomly distribute in the gaps between your elements. Suppose we generate a number in that range, and scatter it between the elements, then we end up with code something like the following:

import numpy, random

n = 5
limit = 20
mingap = 3

slack = 20 - mingap * (n - 1)

def generate():
    steps = random.randint(0, slack)

    increments = numpy.hstack([numpy.ones((steps,)), numpy.zeros((n,))])
    numpy.random.shuffle(increments)

    locs = numpy.argwhere(increments == 0).flatten()
    return numpy.cumsum(increments)[locs] + mingap * numpy.arange(0, n)

If you then invoke this generate() function ten times, you get a collection of vectors something like the following:

[  0.   3.   6.   9.  12.]
[  0.   3.   6.  10.  13.]
[  2.   5.   8.  12.  15.]
[  1.   4.   7.  12.  16.]
[  0.   4.   7.  10.  13.]
[  0.   3.   6.   9.  12.]
[  1.   4.   9.  12.  16.]
[  0.   7.  10.  13.  16.]
[  0.   5.   8.  11.  14.]
[  1.   4.   8.  11.  17.]


回答2:

This:

np.cumsum(np.ones((5,), np.int) * 3 + np.random.randint(0, maxn, (5,))) - 3

will give you 5 random numbers spaced by at least 3 points.

You have to tweak maxn to get the correct maximum value of the random numbers. Perhaps you may want to have a slightly bigger maxn and the reject samples whose elements exceed your maximum value (20).

Note: you didn't say what final distribution you are looking for, e.g. if you want the resulting samples uniformly distributed over the space of the valid samples, or anything else, if that matters.



回答3:

This answer is a followup to the comments on my previous answer.

You said you want uniformly distributed numbers, but that of course is not possible to have while respecting the condition that the numbers must be spaced by at least 3 points.

So, I provide you a different definition of uniformly randomness: suppose you can enumerate all the valid vectors respecting your condition. I wrote a function to do so:

def space_gen(xmin, xmax, len, min_dist, result=[]):
    if len:
        for x in range(xmin, xmax - (len - 1) * min_dist):
            yield from space_gen(x + min_dist, xmax, len - 1, min_dist, result + [x])
    else:
        yield result

Let's consider a smaller instance of the problem. Suppose you want vectors of 3 random numbers from 0 to 10 (excluded), spaced by at least 4 points:

>>> list(space_gen(0,10,3,4))
[[0, 4, 8], [0, 4, 9], [0, 5, 9], [1, 5, 9]]

that list is the complete enumeration of all valid results according to that rule.

Now you can uniformly sample from this list (see for example random.choice).

Now it's possible that your problem size (i.e. the range, or the vector size) make the problem instance too big to be exhaustively enumerated.

But you can still use this "brute force" enumeration to check if a method generates truly uniformly distributed samples.

For the problem instance of your question (0-20 range, 5 length, 3 min. dist) it's still doable:

>>> len(list(space_gen(0,21,5,3)))
1287

For example, we can check if rwp's recipe generates uniformly distributed samples (according to this definition):

space = list(space_gen(0, 21, 5, 3))
counter = {tuple(x): 0 for x in space}
for _ in range(200000):
    x = tuple(map(int,generate()))
    counter[x] += 1
import matplotlib.pyplot as plt
a = np.array(sorted(counter.values()))
plt.hist(a, bins=len(space))
plt.show()

and we observe this distribution of counts:

Clearly there are some vectors occurring way more often than other vectors.

We can also check the first solution I proposed:

def generate1():
    maxn=15
    while 1:
        x = np.cumsum(np.ones((5,), np.int) * 3 + np.random.randint(0, maxn, (5,))) - 3
        if x[-1] <= 20:
            return x

even with maxn=15 and using rejection sampling, it's still a bit skew and not perfectly uniform. Using the same benchmark/plot code as before: