Let's say this code
random.seed(42)
random.sample(range(0,40), 4)
Output:[7, 1, 17, 15]
What should I change in this code to generate random numbers where the minimum distance between any two numbers in the list will be be at least 10 or more. Something like [0, 10, 25, 39] or [0, 12, 23, 38 ]
.
Possible duplicate would be this. Thanks.
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Django __str__ returned non-string (type NoneType)
- Evil ctypes hack in python
One-line solution for the sorted case
Here's a simple one-liner, that generates all possibilities with equal likelihood:
A few sample outputs:
Outputs are always generated in sorted order; if that's not what you want, you can easily add a shuffle to the result (or see below for a general solution).
Explanation: if
[a, b, c, d]
is an ordered list satisfying your requirements, then[a, b-9, c-18, d-27]
is an ordered sample of length 4 fromrange(13)
, and vice versa. So all you need to do is generate samples fromrange(13)
, sort them, and then re-add the necessary multiples of9
to get values that are at least10
apart.General unsorted solution
Here's a general solution that doesn't require sorting of the random sample. Instead, we compute the ranks of the elements of the sample and use those to compute the necessary offsets.
And some sample outputs:
The "cheap trick" solution
If the various constants here in the original problem are fixed (population
range(40)
, samples of length 4, minimum distance of 10), then there's an obvious cheap trick: there are only715
possible different sorted samples, so just pre-create a list containing all of them, and then every time you need to generate a sample, pick one from that pre-created list usingrandom.choice
.For the generation, we can either go with a grossly inefficient but clearly correct brute force solution:
This is still plenty fast enough, taking only a couple of seconds on my machine. Alternatively, we can do something more refined and direct using the same bijection as identified above.
Either way, we generate the list of samples just once, and then use
random.choice
to pick one every time we need it:Of course, this solution doesn't scale well: for 7 samples out of
range(100)
with a minimum distance of 5, there are over 2 billion possible different sorted samples.Demonstration of uniformity
I claimed earlier that the one-liner produces all possibilities with equal likelihood (assuming a perfect source of random numbers, of course, but Python's Mersenne Twister is good enough that we're unlikely to detect statistical anomalies arising from the core generator in the test below). Here's a demonstration of that uniformity.
First, for convenience, we'll wrap our one-liner in a function. We'll also change it to return a
tuple
instead of alist
, because for the next step we want something hashable.Now we generate 10 million samples (this will take a couple of minutes), and count how often each one occurs:
A couple of quick checks:
We've collected 715 different combinations, and a little bit of maths tells us that that's exactly the number we expect (13 choose 4), so with a uniform distribution we'd expect each combination to occur approximately
10**7 / 715
times, or somewhere around 14000 times. Both the combinations we checked above are around 14000, as are the minimum and maximum counts appearing, but not surprisingly, there's some random variation.Is that random variation within acceptable limits? To find out, we can do a chi-squared test with
p = 0.01
. Our null hypothesis is that the population we're drawing from is uniform: i.e., that our code generates each possible sample with equal likelihood.SciPy makes a chi-squared test for uniformity easy:
The p-value we get is not smaller than 0.01, so we fail to reject the null hypothesis: that is, we have no evidence of non-uniformity.
Since the 4 numbers have to each keep a distance of 10, that leaves a "wiggle room" of just 10 out of 40 for the 4 numbers to be randomly distributed in (because 40 - 3 * 10 = 10). You can therefore simply randomize 4 numbers within the room of 10, calculate the deltas, and add the deltas and the corresponding 10s to get the full list.
A sample of 10 runs:
If the sample size remains proportional to the length of your domain, then one option is to shuffle the domain and pick the first four elements which satisfy the requirement.
Using a set to keep track of which numbers are excluded allows the process to be efficient.
Code
Example of output
Time-complexity
Since
random.shuffle
runs in O(n) and the algorithm traverses the shuffled list once, the algorithm is O(n * step).The algorithm being linear with regard to the domain length is the reason for the requirement for the sample size to be proportional to the domain size, otherwise the list might be shuffled for picking only a few elements.
Once you've generated a number, it removes a swath out of your range, since your know that no number can be within +/- 10 of the original.
A naive way to implement this would be to make a list of remaining numbers, and cut chunks out of it every time you pick a number:
Keep in mind that every sample removes up to 19 elements from your domain. That means that you are by no means guaranteed to get 4 elements in the result, but at least 3 are guaranteed.
Depending on the distribution you want, you could do this: