Generating non-consecutive combinations

2019-06-20 05:35发布

站内文章 / Python

11 0

爷的心禁止访问

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am trying to create a generator (iterator which supports doing a next, perhaps using yield in python) which gives all combinations of r elements from {1,2,...n} (n and r are parameters) such that in the selected r elements, no two are consecutive.

For example, for r = 2 and n= 4

The generated combinations are {1,3}, {1,4}, {2, 4}.

I could generate all combinations(as an iterator) and filter those which don't satisfy the criteria, but we will be doing unnecessary work.

Is there some generation algorithm such that the next is O(1) (and if that is not possible, O(r) or O(n)).

The order in which the sets are returned is not relevant (and hopefully will allow an O(1) algorithm).

Note: I have tagged it python, but a language-agnostic algorithm will help too.

Update:

I have found a way to map it to generating pure combinations! A web search reveals that O(1) is possible for combinations (though it seems complicated).

Here is the mapping.

Suppose we have a combination x_1, x_2, ... , x_r with x_1 + 1 < x_2, x_2 + 1 < x_3, ...

We map to y_1, y_2, ..., y_r as follows

y_1 = x_1

y_2 = x_2 - 1

y_3 = x_3 - 2

...

y_r = x_r - (r-1)

This way we have that y_1 < y_2 < y_3 ... without the non-consecutive constraint!

This basically amounts to choosing r elements out of n-r+1. Thus all I need to do is run the generation for (n-r+1 choose r).

For our purposes, using the mapping after things are generated is good enough.

Reasons for choosing svkcr's answer

All great answers, but I have chosen svkcr's answer.

Here are some reasons why

It is effectively stateless (or "Markovian" to be more precise). The next permutation can be generated from the previous one. It is in a way almost optimal: O(r) space and time.
It is predictable. We know exactly the order(lexicographic) in which the combinations are generated.

These two properties make it easy to parallelise the generation (split at predictable points and delegate), with fault tolerance thrown in (can pick off from the last generated combination if a CPU/machine fails)!

Sorry, parallelisation was not mentioned earlier, because it didn't occur to me when I wrote the question and I got that idea only later.

回答1:

This is fun! How about this:

def nonconsecutive_combinations(r, n):
  # first combination, startng at 1, step-size 2
  combination = range(1, r*2, 2)
  # as long as all items are less than or equal to n
  while combination[r-1] <= n:
    yield tuple(combination)
    p = r-1 # pointer to the element we will increment
    a = 1   # number that will be added to the last element
    # find the rightmost element we can increment without
    # making the last element bigger than n
    while p > 0 and combination[p] + a > n:
      p -= 1
      a += 2
    # increment the item and
    # fill the tail of the list with increments of two
    combination[p:] = range(combination[p]+1, combination[p] + 2*(r-p), 2)

Each next() call should have an O(r) .. I got the idea while thinking about how this would translate to natural numbers, but it took quite some time to get the details right.

> list(nonconsecutive_combinations(2, 4))
[(1, 3), (1, 4), (2, 4)]
> list(nonconsecutive_combinations(3, 6))
[(1, 3, 5), (1, 3, 6), (1, 4, 6), (2, 4, 6)]

Let me try to explain how this works.

Conditions for a tuple c with r elements to be part of the result set:

Any element of the tuple is at least as large as the preceding element plus 2. (c[x] >= c[x-1] + 2)
All elements are less than, or equal to n. Because of 1. it is sufficient to say that the last element is less than or equal to n. (c[r-1] <= n)

The smallest tuple that may be part of the result set is (1, 3, 5, ..., 2*r-1). When I say a tuple is "smaller" than another, I am assuming the lexicographical order.

As Blckknght is pointing out, even the smallest possible tuple may be to large to satisfy condition 2.

The function above contains two while loops:

The outer loop steps through the results and assumes they appear in lexicographical order and satisfy condition one. As soon as the tuple in question violates condition two, we know that we have exhausted the result set and are done:
```
combination = range(1, r*2, 2)
while combination[r-1] <= n:
```
The first line initializes the result-tuple with the first possible result according to condition one. Line two squarely translates to condition two.
The inner loop finds the next possible tuple satisfying condition one.
```
yield tuple(combination)
```
Since the while condition (2) is true and we made sure the result satisfies condition one we can yield the current result-tuple.

Next, to find the lexicographically next tuple, we would add "1" to the last element.
```
# we don't actually do this:
combination[r-1] += 1
```
However, that may break condition 2 too early. So, if that operation would break condition 2, we increment preceding element and adjust the last element accordingly. This is a little like counting integers base 10: "If the last digit is larger than 9, increment the previous digit and make the last digit a 0." But instead of adding zeros, we fill the tuple so that condition 1 is true.
```
# if above does not work
combination[r-2] += 1
combination[r-1]  = combination[r-2] + 2
```
Problem is, the second line may break condition two yet again. So what we actually do is, we keep track of the last element and that is what is done with the a. Also we use the p variable to refer to the index current element we are looking at.
```
p = r-1
a = 1
while p > 0 and combination[p] + a > n:
  p -= 1
  a += 2
```
We are iterating right-to-left (p = r-1, p -= 1) through the items of the result tuple. Initially we want to add one to the last item (a = 1) but when stepping through the tuple we actually want to replace the last item with the value of a preceding item plus 2*x, where x is the distance between the two items. (a += 2, combination[p] + a)

Finally, we have found the item we want to increment, and fill the rest of the tuple with a sequence starting at the incremented item, with a step size of 2:
```
combination[p:] = range(combination[p]+1, combination[p] + 2*(r-p), 2)
```
And that's it. It seemed so easy when I first thought of it, but all the arithmetic throughout the function make a great place for off-by-one errors and describing it is harder than it should be. I should have known I'm in trouble when I added that inner loop :)

On performance ..

Unfortunately while loops filled with arithmetic are not the most efficient thing to write in Python. The other solutions accept that reality and use list comprehensions or filtering to push the heavy lifting down into the Python runtime. This seems to me to be the right thing to do.

On the other hand, I'm quite certain that my solution would perform a lot better than most if this were C. The inner while loop is O(log r) and it mutates the result in-place and the same O(log r). It does not consume additional stack frames and does not consume any memory besides the result and two variables. But obviously this is not C, so none of this really matters.

回答2:

here is my recursive generator(it just skips n+1th item if nth item is selected):

def non_consecutive_combinator(rnge, r, prev=[]):
    if r == 0:
        yield prev

    else:
        for i, item in enumerate(rnge):
            for next_comb in non_consecutive_combinator(rnge[i+2:], r-1, prev+[item]):
                yield next_comb

print list(non_consecutive_combinator([1,2,3,4], 2))
#[[1, 3], [1, 4], [2, 4]]
print list(non_consecutive_combinator([1,2,3,4,5], 2))
#[[1, 3], [1, 4], [1, 5], [2, 4], [2, 5], [3, 5]]
print list(non_consecutive_combinator(range(1, 10), 3))
#[[1, 3, 5], [1, 3, 6], [1, 3, 7], [1, 3, 8], [1, 3, 9], [1, 4, 6], [1, 4, 7], [1, 4, 8], [1, 4, 9], [1, 5, 7], [1, 5, 8], [1, 5, 9], [1, 6, 8], [1, 6, 9], [1, 7, 9], [2, 4, 6], [2, 4, 7], [2, 4, 8], [2, 4, 9], [2, 5, 7], [2, 5, 8], [2, 5, 9], [2, 6, 8], [2, 6, 9], [2, 7, 9], [3, 5, 7], [3, 5, 8], [3, 5, 9], [3, 6, 8], [3, 6, 9], [3, 7, 9], [4, 6, 8], [4, 6, 9], [4, 7, 9], [5, 7, 9]]

on efficiency :

this code can't be O(1) because of traversing stack and building a new collection upon each iteration won't be O(1). also, a recursive generator means you have to use r maximum stack depth to get r-item combination. this means with low r, the cost of calling stacks can be more expensive then non-recursive generation. with enough n and r, it is potentially much more efficient than itertools-based solution.

i've tested two uploaded codes in this question:

from itertools import ifilter, combinations
import timeit

def filtered_combi(n, r):
    def good(combo):
        return not any(combo[i]+1 == combo[i+1] for i in range(len(combo)-1))
    return ifilter(good, combinations(range(1, n+1), r))

def non_consecutive_combinator(rnge, r, prev=[]):
    if r == 0:
        yield prev

    else:
        for i, item in enumerate(rnge):
            for next_comb in non_consecutive_combinator(rnge[i+2:], r-1, prev+[item]):
                yield next_comb

def wrapper(n, r):
    return non_consecutive_combinator(range(1, n+1), r)   

def consume(f, *args, **kwargs):
    deque(f(*args, **kwargs))

t = timeit.timeit(lambda : consume(wrapper, 30, 4), number=100)
f = timeit.timeit(lambda : consume(filtered_combi, 30, 4), number=100)

results and more results(edit) (on windows7, python 64bit 2.7.3, core i5 ivy bridge with 8gb ram) :

(n, r)  recursive   itertools
----------------------------------------
(30, 4) 1.6728046   4.0149797   100 times   17550 combinations
(20, 4) 2.6734657   7.0835997   1000 times  2380 combinations
(10, 4) 0.1253318   0.3157737   1000 times  35 combinations
(4, 2)  0.0091073   0.0120918   1000 times  3 combinations
(20, 5) 0.6275073   2.4236898   100 times   4368 combinations
(20, 6) 1.0542227   6.1903468   100 times   5005 combinations
(20, 7) 1.3339530   12.4065561  100 times   3432 combinations
(20, 8) 1.4118724   19.9793801  100 times   1287 combinations
(20, 9) 1.4116702   26.1977839  100 times   220 combinations

as you can see, ~~the gap between recursive solution and itertools.combination based solution gets wider as n goes up~~.

in fact, with the gap between both solutions is heavily dependent on r - bigger r means you have to throw away more combinations generated from itertools.combinations. for example, in case of n=20, r=9 : we filter and take only 220 combinations among 167960(20C9) combinations. if n and r are small, using itertools.combinations can be faster because it is more efficient at less r and does not use stack as I explained. (as you can see, itertools are very optimized(if write your logic with for, if, while and bunch of generator and list comprehensions, it won't be as fast as the abstracted one with itertools), and this is an one of reasons why people love python - you bring your code to higher level, and you get rewarded. not many languages to that.

回答3:

If there were a way to generate all combinations in O(1), you could do this in O(r) just by generating and filtering. Assuming itertools.combinations has an O(1) next, there are at most r-1 values to skip over, so the worst case is r-1 times O(1), right?

Jumping ahead a bit to avoid confusion, I don't think there is an O(1) implementation of combinations, and therefore this is not O(r). But is there anything possible that is? I'm not sure. Anyway…

So:

def nonconsecutive_combinations(p, r):
    def good(combo):
        return not any(combo[i]+1 == combo[i+1] for i in range(len(combo)-1))
    return filter(good, itertools.combinations(p, r))

r, n = 2, 4
print(list(nonconsecutive_combinations(range(1, n+1), r))

This prints:

[(1, 3), (1, 4), (2, 4)]

The itertools documentation doesn't guarantee that combinations has an O(1) next. But it seems to me that if there is a possible O(1) algorithm, they'd use it, and if there isn't, you're not going to find one.

You can read the source code, or we could time it… but we're going to do that, let's time the whole thing.

http://pastebin.com/ydk1TMbD has my code, and thkang's code, and a test driver. The times it's printing are the cost of iterating the entire sequence, divided by the length of the sequence.

With n ranging from 4 to 20 and r fixed at 2, we can see that the times for both go down. (Remember, the total time to iterate the sequence is of course going up. It's just sublinear in the total length) With n ranging from 7 to 20 and r fixed at 4, the same is true.

With n fixed at 12 and r ranging from 2 to 5, the times for both go up linearly from 2 through 5, but they're much higher for 1 and 6 than you'd expect.

On reflection, that makes sense—there are only 6 good values out of 924, right? And that's why the time per next was going down as n went up. The total time is going up, but the number of values yielded is going up even faster.

So, combinations doesn't have an O(1) next; what it does have is something complicated. And my algorithm does not have an O(r) next; it's also something complicated. I think the performance guarantees would be a lot easier to specify over the whole iteration than per next (and then it's easy to divide by the number of values if you know how to calculate that).

At any rate, the performance characteristics are exactly the same for the two algorithms I tested. (Oddly, switching the wrapper return to yield from made the recursive one faster and the filtering one slower… but it's a small constant factor anyway, so who cares?)

回答4:

Here's my attempt at a recursive generator:

def combinations_nonseq(r, n):
    if r == 0:
        yield ()
        return

    for v in range(2*r-1, n+1):
        for c in combinations_nonseq(r-1, v-2):
            yield c + (v,)

This is approximately the same algorithm as thkang's recursive generator, but it has better performance. If n is close to r*2-1 the improvement is very large, while for smaller r values (relative to n) it is a small improvement. It's also a bit better than svckr's code, without a clear connection to n or r values.

The key insight I had was that when n is less than 2*r-1, there can be no combinations that do not have adjacent values. This lets my generator do less work than thkang's.

Here's some timings, run using a modified version of thkang's test code. It uses the timeit module to find out how much time it takes to consume the whole contents of the generator ten times. The # column shows the number of values are yielded by my code (I'm pretty sure all the others are the same).

( n, r)      # |abarnert |  thkang |   svckr |BlckKnght| Knoothe |JFSebastian
===============+=========+=========+=========+=========+=========+========
(16, 2)    105 |  0.0037 |  0.0026 |  0.0064 |  0.0017 |  0.0047 |  0.0020
(16, 4)    715 |  0.0479 |  0.0181 |  0.0281 |  0.0117 |  0.0215 |  0.0143
(16, 6)    462 |  0.2034 |  0.0400 |  0.0191 |  0.0125 |  0.0153 |  0.0361
(16, 8)      9 |  0.3158 |  0.0410 |  0.0005 |  0.0006 |  0.0004 |  0.0401
===============+=========+=========+=========+=========+=========+========
(24, 2)    253 |  0.0054 |  0.0037 |  0.0097 |  0.0022 |  0.0069 |  0.0026
(24, 4)   5985 |  0.2703 |  0.1131 |  0.2337 |  0.0835 |  0.1772 |  0.0811
(24, 6)  27132 |  3.6876 |  0.8047 |  1.0896 |  0.5517 |  0.8852 |  0.6374
(24, 8)  24310 | 19.7518 |  1.7545 |  1.0015 |  0.7019 |  0.8387 |  1.5343

For larger values of n, abarnert's code was taking too long, so I've dropped it from the next tests:

( n, r)      # |  thkang |   svckr |BlckKnght| Knoothe |JFSebastian
===============+=========+=========+=========+=========+========
(32, 2)    465 |  0.0069 |  0.0178 |  0.0040 |  0.0127 |  0.0064
(32, 4)  23751 |  0.4156 |  0.9330 |  0.3176 |  0.7068 |  0.2766
(32, 6) 296010 |  7.1074 | 11.8937 |  5.6699 |  9.7678 |  4.9028
(32, 8)1081575 | 37.8419 | 44.5834 | 27.6628 | 37.7919 | 28.4133

The code I've been testing with is here.

回答5:

Here's a solution similar to @thkang's answer but with explicit stack:

def combinations_stack(seq, r):
    stack = [(0, r, ())]
    while stack:
        j, r, prev = stack.pop()
        if r == 0:
            yield prev
        else:
            for i in range(len(seq)-1, j-1, -1):
                stack.append((i+2, r-1, prev + (seq[i],)))

Example:

print(list(combinations_stack(range(1, 4+1), 2)))
# -> [(1, 3), (1, 4), (2, 4)]

For some (n, r) values it is the fastest solution according to the benchmark on my machine:

name                                time ratio comment
combinations_knoothe           17.4 usec  1.00 8 4
combinations_blckknght         17.9 usec  1.03 8 4
combinations_svckr             20.1 usec  1.16 8 4
combinations_stack             62.4 usec  3.59 8 4
combinations_thkang            69.6 usec  4.00 8 4
combinations_abarnert           123 usec  7.05 8 4
name                                time ratio comment
combinations_stack              965 usec  1.00 16 4
combinations_blckknght         1e+03 usec  1.04 16 4
combinations_knoothe           1.62 msec  1.68 16 4
combinations_thkang            1.64 msec  1.70 16 4
combinations_svckr             1.84 msec  1.90 16 4
combinations_abarnert           3.3 msec  3.42 16 4
name                                time ratio comment
combinations_stack               18 msec  1.00 32 4
combinations_blckknght         28.1 msec  1.56 32 4
combinations_thkang            40.4 msec  2.25 32 4
combinations_knoothe           53.3 msec  2.96 32 4
combinations_svckr             59.8 msec  3.32 32 4
combinations_abarnert          68.3 msec  3.79 32 4
name                                time ratio comment
combinations_stack             1.84  sec  1.00 32 8
combinations_blckknght         2.27  sec  1.24 32 8
combinations_svckr             2.83  sec  1.54 32 8
combinations_knoothe           3.08  sec  1.68 32 8
combinations_thkang            3.29  sec  1.79 32 8
combinations_abarnert            22  sec 11.99 32 8

where combinations_knoothe is an implementation of the algorithm described in the question:

import itertools
from itertools import imap as map

def _combinations_knoothe(n, r):
    def y2x(y):
        """
        y_1 = x_1
        y_2 = x_2 - 1
        y_3 = x_3 - 2
        ...
        y_r = x_r - (r-1)
        """
        return tuple(yy + i for i, yy in enumerate(y))
    return map(y2x, itertools.combinations(range(1, n-r+1+1), r))

def combinations_knoothe(seq, r):
    assert seq == list(range(1, len(seq)+1))
    return _combinations_knoothe(len(seq), r)

and other functions are from corresponding answers (modified to accept input in the unified format).