I'm trying to get n random and non-overlapping slices of a sequence where each subsequence is of length l, preferably in the order they appear.
This is the code I have so far and it's gotten more and more messy with each attempt to make it work, needless to say it doesn't work.
def rand_parts(seq, n, l):
"""
return n random non-overlapping partitions each of length l.
If n * l > len(seq) raise error.
"""
if n * l > len(seq):
raise Exception('length of seq too short for given n, l arguments')
if not isinstance(seq, list):
seq = list(seq)
gaps = [0] * (n + 1)
for g in xrange(len(seq) - (n * l)):
gaps[random.randint(0, len(gaps) - 1)] += 1
result = []
for i, g in enumerate(gaps):
x = g + (i * l)
result.append(seq[x:x+l])
if i < len(gaps) - 1:
gaps[i] += x
return result
For example if we say rand_parts([1, 2, 3, 4, 5, 6], 2, 2)
there are 6 possible results that it could return from the following diagram:
[1, 2, 3, 4, 5, 6]
____ ____
[1, 2, 3, 4, 5, 6]
____ ____
[1, 2, 3, 4, 5, 6]
____ ____
[1, 2, 3, 4, 5, 6]
____ ____
[1, 2, 3, 4, 5, 6]
____ ____
[1, 2, 3, 4, 5, 6]
____ ____
So [[3, 4], [5, 6]]
would be acceptable but [[3, 4], [4, 5]]
wouldn't because it's overlapping and [[2, 4], [5, 6]]
also wouldn't because [2, 4]
isn't contiguous.
I encountered this problem while doing a little code golfing so for interests sake it would also be nice to see both a simple solution and/or an efficient one, not so much interested in my existing code.
This worked for me in Python 3.3.2. It should be backwards compatible with Python 2.7.
The above code will always print [[1, 2], [3, 4], [5, 6]]
Many solutions can be hacked for this problem, but one has to be careful if the sequences are to be strictly random. For example, it's wrong to begin by picking a random number between 0 and
len(seq)-n*l
and say that the first sequence will start there, then work recursively.The problem is equivalent to selecting randomly
n+1
integer numbers such that their sum is equal tolen(seq)-l*n
. (These numbers will be the "gaps" between your sequences.) To solve it, you can see this question.If you do it recursively it's much simpler. Take the first part from (so the rest will fit):
and then recurse with what left to do:
To understand this, first consider the case
l == 1
. Then it's basically just returning arandom.sample()
of the input data in sorted order; in this case theoffset
variable is always 0.The case where
l > 1
is an extension of the previous case. We userandom.sample()
to pick up positions, but maintain anoffset
to shift successive results: in this way, we make sure that they are non-overlapping ranges --- i.e. they start at a distance of at leastl
of each other, rather than 1.First of all, I think you need to clarify what you mean by the term random.
How can you generate a truly random list of sub-sequences when you are placing specific restrictions on the sub-sequences themselves?
As far as I know, the best "randomness" anyone can achieve in this context is generating all lists of sub-sequences that satisfy your criteria, and selecting from the pool however many you need in a random fashion.
Now based on my experience from an algorithms class that I've taken a few years ago, your problem seems to be a typical example which could be solved using a greedy algorithm making these big (but likely?) assumptions about what you were actually asking in the first place:
Assuming the above here is a greedy algorithm which generates one of the possible lists of sub-sequences in linear time (excluding sorting, which is O(n*log(n))):
The gist of the algorithm is as follows:
Hope this was what you were looking for.
EDIT: For some reason I wrongly assumed that there couldn't be repeating values in a sub-sequence, this one allows it.