可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I am trying to create a generator (iterator which supports doing a next, perhaps using yield in python) which gives all combinations of r elements from {1,2,...n} (n and r are parameters) such that in the selected r elements, no two are consecutive.
For example, for r = 2 and n= 4
The generated combinations are {1,3}, {1,4}, {2, 4}
.
I could generate all combinations(as an iterator) and filter those which don't satisfy the criteria, but we will be doing unnecessary work.
Is there some generation algorithm such that the next
is O(1) (and if that is not possible, O(r) or O(n)).
The order in which the sets are returned is not relevant (and hopefully will allow an O(1) algorithm).
Note: I have tagged it python, but a language-agnostic algorithm will help too.
Update:
I have found a way to map it to generating pure combinations! A web search reveals that O(1) is possible for combinations (though it seems complicated).
Here is the mapping.
Suppose we have a combination x_1, x_2, ... , x_r
with x_1 + 1 < x_2, x_2 + 1 < x_3, ...
We map to y_1, y_2, ..., y_r
as follows
y_1 = x_1
y_2 = x_2 - 1
y_3 = x_3 - 2
...
y_r = x_r - (r-1)
This way we have that y_1 < y_2 < y_3 ...
without the non-consecutive constraint!
This basically amounts to choosing r elements out of n-r+1. Thus all I need to do is run the generation for (n-r+1 choose r).
For our purposes, using the mapping after things are generated is good enough.
Reasons for choosing svkcr's answer
All great answers, but I have chosen svkcr's answer.
Here are some reasons why
It is effectively stateless (or "Markovian" to be more precise). The next permutation can be generated from the previous one. It is in a way almost optimal: O(r) space and time.
It is predictable. We know exactly the order(lexicographic) in which the combinations are generated.
These two properties make it easy to parallelise the generation (split at predictable points and delegate), with fault tolerance thrown in (can pick off from the last generated combination if a CPU/machine fails)!
Sorry, parallelisation was not mentioned earlier, because it didn't occur to me when I wrote the question and I got that idea only later.
回答1:
This is fun! How about this:
def nonconsecutive_combinations(r, n):
# first combination, startng at 1, step-size 2
combination = range(1, r*2, 2)
# as long as all items are less than or equal to n
while combination[r-1] <= n:
yield tuple(combination)
p = r-1 # pointer to the element we will increment
a = 1 # number that will be added to the last element
# find the rightmost element we can increment without
# making the last element bigger than n
while p > 0 and combination[p] + a > n:
p -= 1
a += 2
# increment the item and
# fill the tail of the list with increments of two
combination[p:] = range(combination[p]+1, combination[p] + 2*(r-p), 2)
Each next()
call should have an O(r) ..
I got the idea while thinking about how this would translate to natural numbers, but it took quite some time to get the details right.
> list(nonconsecutive_combinations(2, 4))
[(1, 3), (1, 4), (2, 4)]
> list(nonconsecutive_combinations(3, 6))
[(1, 3, 5), (1, 3, 6), (1, 4, 6), (2, 4, 6)]
Let me try to explain how this works.
Conditions for a tuple c
with r
elements to be part of the result set:
- Any element of the tuple is at least as large as the preceding element plus 2.
(
c[x] >= c[x-1] + 2
)
- All elements are less than, or equal to
n
.
Because of 1. it is sufficient to say that the last element is less than
or equal to n
. (c[r-1] <= n
)
The smallest tuple that may be part of the result set is (1, 3, 5, ..., 2*r-1)
.
When I say a tuple is "smaller" than another, I am assuming the lexicographical order.
As Blckknght is pointing out, even the smallest possible tuple may be to large to satisfy condition 2.
The function above contains two while loops:
The outer loop steps through the results and assumes they appear in lexicographical order and satisfy condition one. As soon as the tuple in question violates condition two, we know that we have exhausted the result set and are done:
combination = range(1, r*2, 2)
while combination[r-1] <= n:
The first line initializes the result-tuple with the first possible result according to condition one. Line two squarely translates to condition two.
The inner loop finds the next possible tuple satisfying condition one.
yield tuple(combination)
Since the while
condition (2) is true and we made sure the result satisfies condition one we can yield the current result-tuple.
Next, to find the lexicographically next tuple, we would add "1" to the last element.
# we don't actually do this:
combination[r-1] += 1
However, that may break condition 2 too early. So, if that operation would break condition 2, we increment preceding element and adjust the last element accordingly. This is a little like counting integers base 10: "If the last digit is larger than 9, increment the previous digit and make the last digit a 0." But instead of adding zeros, we fill the tuple so that condition 1 is true.
# if above does not work
combination[r-2] += 1
combination[r-1] = combination[r-2] + 2
Problem is, the second line may break condition two yet again. So what we actually do is, we keep track of the last element and that is what is done with the a
. Also we use the p
variable to refer to the index current element we are looking at.
p = r-1
a = 1
while p > 0 and combination[p] + a > n:
p -= 1
a += 2
We are iterating right-to-left (p = r-1, p -= 1) through the items of the result tuple.
Initially we want to add one to the last item (a = 1) but when stepping through the tuple we actually want to replace the last item with the value of a preceding item plus 2*x
, where x
is the distance between the two items. (a += 2, combination[p] + a)
Finally, we have found the item we want to increment, and fill the rest of the tuple with a sequence starting at the incremented item, with a step size of 2:
combination[p:] = range(combination[p]+1, combination[p] + 2*(r-p), 2)
And that's it. It seemed so easy when I first thought of it, but all the arithmetic throughout the function make a great place for off-by-one errors and describing it is harder than it should be. I should have known I'm in trouble when I added that inner loop :)
On performance ..
Unfortunately while loops filled with arithmetic are not the most efficient thing to write in Python. The other solutions accept that reality and use list comprehensions or filtering to push the heavy lifting down into the Python runtime. This seems to me to be the right thing to do.
On the other hand, I'm quite certain that my solution would perform a lot better than most if this were C. The inner while loop is O(log r) and it mutates the result in-place and the same O(log r). It does not consume additional stack frames and does not consume any memory besides the result and two variables. But obviously this is not C, so none of this really matters.
回答2:
here is my recursive generator(it just skips n+1
th item if n
th item is selected):
def non_consecutive_combinator(rnge, r, prev=[]):
if r == 0:
yield prev
else:
for i, item in enumerate(rnge):
for next_comb in non_consecutive_combinator(rnge[i+2:], r-1, prev+[item]):
yield next_comb
print list(non_consecutive_combinator([1,2,3,4], 2))
#[[1, 3], [1, 4], [2, 4]]
print list(non_consecutive_combinator([1,2,3,4,5], 2))
#[[1, 3], [1, 4], [1, 5], [2, 4], [2, 5], [3, 5]]
print list(non_consecutive_combinator(range(1, 10), 3))
#[[1, 3, 5], [1, 3, 6], [1, 3, 7], [1, 3, 8], [1, 3, 9], [1, 4, 6], [1, 4, 7], [1, 4, 8], [1, 4, 9], [1, 5, 7], [1, 5, 8], [1, 5, 9], [1, 6, 8], [1, 6, 9], [1, 7, 9], [2, 4, 6], [2, 4, 7], [2, 4, 8], [2, 4, 9], [2, 5, 7], [2, 5, 8], [2, 5, 9], [2, 6, 8], [2, 6, 9], [2, 7, 9], [3, 5, 7], [3, 5, 8], [3, 5, 9], [3, 6, 8], [3, 6, 9], [3, 7, 9], [4, 6, 8], [4, 6, 9], [4, 7, 9], [5, 7, 9]]
on efficiency :
this code can't be O(1) because of traversing stack and building a new collection upon each iteration won't be O(1). also, a recursive generator means you have to use r
maximum stack depth to get r
-item combination. this means with low r
, the cost of calling stacks can be more expensive then non-recursive generation. with enough n
and r
, it is potentially much more efficient than itertools-based solution.
i've tested two uploaded codes in this question:
from itertools import ifilter, combinations
import timeit
def filtered_combi(n, r):
def good(combo):
return not any(combo[i]+1 == combo[i+1] for i in range(len(combo)-1))
return ifilter(good, combinations(range(1, n+1), r))
def non_consecutive_combinator(rnge, r, prev=[]):
if r == 0:
yield prev
else:
for i, item in enumerate(rnge):
for next_comb in non_consecutive_combinator(rnge[i+2:], r-1, prev+[item]):
yield next_comb
def wrapper(n, r):
return non_consecutive_combinator(range(1, n+1), r)
def consume(f, *args, **kwargs):
deque(f(*args, **kwargs))
t = timeit.timeit(lambda : consume(wrapper, 30, 4), number=100)
f = timeit.timeit(lambda : consume(filtered_combi, 30, 4), number=100)
results and more results(edit) (on windows7, python 64bit 2.7.3, core i5 ivy bridge with 8gb ram) :
(n, r) recursive itertools
----------------------------------------
(30, 4) 1.6728046 4.0149797 100 times 17550 combinations
(20, 4) 2.6734657 7.0835997 1000 times 2380 combinations
(10, 4) 0.1253318 0.3157737 1000 times 35 combinations
(4, 2) 0.0091073 0.0120918 1000 times 3 combinations
(20, 5) 0.6275073 2.4236898 100 times 4368 combinations
(20, 6) 1.0542227 6.1903468 100 times 5005 combinations
(20, 7) 1.3339530 12.4065561 100 times 3432 combinations
(20, 8) 1.4118724 19.9793801 100 times 1287 combinations
(20, 9) 1.4116702 26.1977839 100 times 220 combinations
as you can see, the gap between recursive solution and itertools.combination based solution gets wider as n
goes up.
in fact, with the gap between both solutions is heavily dependent on r
- bigger r
means you have to throw away more combinations generated from itertools.combinations
. for example, in case of n=20, r=9
: we filter and take only 220 combinations among 167960(20C9) combinations. if n
and r
are small, using itertools.combinations
can be faster because it is more efficient at less r and does not use stack as I explained. (as you can see, itertools are very optimized(if write your logic with for
, if
, while
and bunch of generator and list comprehensions, it won't be as fast as the abstracted one with itertools), and this is an one of reasons why people love python - you bring your code to higher level, and you get rewarded. not many languages to that.
回答3:
If there were a way to generate all combinations in O(1), you could do this in O(r) just by generating and filtering. Assuming itertools.combinations
has an O(1) next
, there are at most r-1 values to skip over, so the worst case is r-1 times O(1), right?
Jumping ahead a bit to avoid confusion, I don't think there is an O(1) implementation of combinations
, and therefore this is not O(r). But is there anything possible that is? I'm not sure. Anyway…
So:
def nonconsecutive_combinations(p, r):
def good(combo):
return not any(combo[i]+1 == combo[i+1] for i in range(len(combo)-1))
return filter(good, itertools.combinations(p, r))
r, n = 2, 4
print(list(nonconsecutive_combinations(range(1, n+1), r))
This prints:
[(1, 3), (1, 4), (2, 4)]
The itertools
documentation doesn't guarantee that combinations
has an O(1) next
. But it seems to me that if there is a possible O(1) algorithm, they'd use it, and if there isn't, you're not going to find one.
You can read the source code, or we could time it… but we're going to do that, let's time the whole thing.
http://pastebin.com/ydk1TMbD has my code, and thkang's code, and a test driver. The times it's printing are the cost of iterating the entire sequence, divided by the length of the sequence.
With n
ranging from 4 to 20 and r
fixed at 2, we can see that the times for both go down. (Remember, the total time to iterate the sequence is of course going up. It's just sublinear in the total length
) With n
ranging from 7 to 20 and r
fixed at 4, the same is true.
With n
fixed at 12 and r
ranging from 2 to 5, the times for both go up linearly from 2 through 5, but they're much higher for 1 and 6 than you'd expect.
On reflection, that makes sense—there are only 6 good values out of 924, right? And that's why the time per next
was going down as n
went up. The total time is going up, but the number of values yielded is going up even faster.
So, combinations
doesn't have an O(1) next
; what it does have is something complicated. And my algorithm does not have an O(r) next
; it's also something complicated. I think the performance guarantees would be a lot easier to specify over the whole iteration than per next
(and then it's easy to divide by the number of values if you know how to calculate that).
At any rate, the performance characteristics are exactly the same for the two algorithms I tested. (Oddly, switching the wrapper return
to yield from
made the recursive one faster and the filtering one slower… but it's a small constant factor anyway, so who cares?)
回答4:
Here's my attempt at a recursive generator:
def combinations_nonseq(r, n):
if r == 0:
yield ()
return
for v in range(2*r-1, n+1):
for c in combinations_nonseq(r-1, v-2):
yield c + (v,)
This is approximately the same algorithm as thkang's recursive generator, but it has better performance. If n
is close to r*2-1
the improvement is very large, while for smaller r
values (relative to n
) it is a small improvement. It's also a bit better than svckr's code, without a clear connection to n
or r
values.
The key insight I had was that when n
is less than 2*r-1
, there can be no combinations that do not have adjacent values. This lets my generator do less work than thkang's.
Here's some timings, run using a modified version of thkang's test
code. It uses the timeit
module to find out how much time it takes to consume the whole contents of the generator ten times. The #
column shows the number of values are yielded by my code (I'm pretty sure all the others are the same).
( n, r) # |abarnert | thkang | svckr |BlckKnght| Knoothe |JFSebastian
===============+=========+=========+=========+=========+=========+========
(16, 2) 105 | 0.0037 | 0.0026 | 0.0064 | 0.0017 | 0.0047 | 0.0020
(16, 4) 715 | 0.0479 | 0.0181 | 0.0281 | 0.0117 | 0.0215 | 0.0143
(16, 6) 462 | 0.2034 | 0.0400 | 0.0191 | 0.0125 | 0.0153 | 0.0361
(16, 8) 9 | 0.3158 | 0.0410 | 0.0005 | 0.0006 | 0.0004 | 0.0401
===============+=========+=========+=========+=========+=========+========
(24, 2) 253 | 0.0054 | 0.0037 | 0.0097 | 0.0022 | 0.0069 | 0.0026
(24, 4) 5985 | 0.2703 | 0.1131 | 0.2337 | 0.0835 | 0.1772 | 0.0811
(24, 6) 27132 | 3.6876 | 0.8047 | 1.0896 | 0.5517 | 0.8852 | 0.6374
(24, 8) 24310 | 19.7518 | 1.7545 | 1.0015 | 0.7019 | 0.8387 | 1.5343
For larger values of n
, abarnert's code was taking too long, so I've dropped it from the next tests:
( n, r) # | thkang | svckr |BlckKnght| Knoothe |JFSebastian
===============+=========+=========+=========+=========+========
(32, 2) 465 | 0.0069 | 0.0178 | 0.0040 | 0.0127 | 0.0064
(32, 4) 23751 | 0.4156 | 0.9330 | 0.3176 | 0.7068 | 0.2766
(32, 6) 296010 | 7.1074 | 11.8937 | 5.6699 | 9.7678 | 4.9028
(32, 8)1081575 | 37.8419 | 44.5834 | 27.6628 | 37.7919 | 28.4133
The code I've been testing with is here.
回答5:
Here's a solution similar to @thkang's answer but with explicit stack:
def combinations_stack(seq, r):
stack = [(0, r, ())]
while stack:
j, r, prev = stack.pop()
if r == 0:
yield prev
else:
for i in range(len(seq)-1, j-1, -1):
stack.append((i+2, r-1, prev + (seq[i],)))
Example:
print(list(combinations_stack(range(1, 4+1), 2)))
# -> [(1, 3), (1, 4), (2, 4)]
For some (n, r)
values it is the fastest solution according to the benchmark on my machine:
name time ratio comment
combinations_knoothe 17.4 usec 1.00 8 4
combinations_blckknght 17.9 usec 1.03 8 4
combinations_svckr 20.1 usec 1.16 8 4
combinations_stack 62.4 usec 3.59 8 4
combinations_thkang 69.6 usec 4.00 8 4
combinations_abarnert 123 usec 7.05 8 4
name time ratio comment
combinations_stack 965 usec 1.00 16 4
combinations_blckknght 1e+03 usec 1.04 16 4
combinations_knoothe 1.62 msec 1.68 16 4
combinations_thkang 1.64 msec 1.70 16 4
combinations_svckr 1.84 msec 1.90 16 4
combinations_abarnert 3.3 msec 3.42 16 4
name time ratio comment
combinations_stack 18 msec 1.00 32 4
combinations_blckknght 28.1 msec 1.56 32 4
combinations_thkang 40.4 msec 2.25 32 4
combinations_knoothe 53.3 msec 2.96 32 4
combinations_svckr 59.8 msec 3.32 32 4
combinations_abarnert 68.3 msec 3.79 32 4
name time ratio comment
combinations_stack 1.84 sec 1.00 32 8
combinations_blckknght 2.27 sec 1.24 32 8
combinations_svckr 2.83 sec 1.54 32 8
combinations_knoothe 3.08 sec 1.68 32 8
combinations_thkang 3.29 sec 1.79 32 8
combinations_abarnert 22 sec 11.99 32 8
where combinations_knoothe
is an implementation of the algorithm described in the question:
import itertools
from itertools import imap as map
def _combinations_knoothe(n, r):
def y2x(y):
"""
y_1 = x_1
y_2 = x_2 - 1
y_3 = x_3 - 2
...
y_r = x_r - (r-1)
"""
return tuple(yy + i for i, yy in enumerate(y))
return map(y2x, itertools.combinations(range(1, n-r+1+1), r))
def combinations_knoothe(seq, r):
assert seq == list(range(1, len(seq)+1))
return _combinations_knoothe(len(seq), r)
and other functions are from corresponding answers (modified to accept input in the unified format).