How to create an iterator that produces items wher

2019-07-09 20:17发布

问题:

I have created a script that uses the following code to iterate over all combinations of characters in the sCharacters string:

sCharacters = "abcdefghijklmnopqrstuvwxyz0123456789"
iKeyLength = len(sCharacters)

for sCharacterCombination in itertools.product(sCharacters, repeat=iKeyLength):
    # Do Stuff Here

However I am only interested in combinations where no single character is represented more than n number of times in the sCharacterCombination. Eg; I want to filter out strings like aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab and only get ones like 7efad3ca411bf57f1df57c0b4e9282aa

I tried just checking each sCharacterCombination, but this is no good as I still have to iterate over a mountain of items I will never use.

How can I get the iterator to create the list based on each item having no single character represented more than n number of times in the first place, so I don't have to iterate over items I will not use?

It would be awesome if I could say:

  • Max number of times a single character could be represented in the sCharacterCombination.
  • Max times a single character could be represented in a row.

That would allow me to say a single character could appear in the sCharacterCombination a max of four times, but it could not be in a row more than twice. E.g. This is ok 1121... but this is not 1112....

Thank you for your time.

回答1:

Here's some code that's more efficient than your current approach.

First, we use itertool's combinations_with_replacement function to create combinations of the desired length, filtering out combinations that have more than the desired number of repetitions. Then we permute each combination; the permutation algorithm I use (created by the 14th century Indian mathematician Narayana Pandita) handles repetitions properly, unlike the one in itertools. Then we use itertool's groupby to filter out permutations that contain runs of identical characters that are greater than the permitted run length.

I've included two functions: permutations_with_limited_repetition limits the length of runs of identical characters; permutations_with_repetition does not.

Note that the input sequence must be sorted from low to high or this algorithm will not function correctly.

from itertools import combinations_with_replacement, groupby

def lexico_permute(a):
    a = list(a)
    yield a
    n = len(a) - 1
    while True:
        for j in range(n-1, -1, -1):
            if a[j] < a[j + 1]:
                break
        else:
            return

        v = a[j]
        for k in range(n, j, -1):
            if v < a[k]:
                break

        a[j], a[k] = a[k], a[j]
        a[j+1:] = a[j+1:][::-1]
        yield a

def permutations_with_repetition(seq, length, maxrepeat): 
    for combo in combinations_with_replacement(seq, length):
        if any(combo.count(c) > maxrepeat for c in combo):
            continue
        yield from lexico_permute(combo)

def permutations_with_limited_repetition(seq, length, maxrepeat, maxrun): 
    for combo in combinations_with_replacement(seq, length):
        if any(combo.count(c) > maxrepeat for c in combo):
            continue
        for perm in lexico_permute(combo):
            if any(len(list(g)) > maxrun for _, g in groupby(perm)):
                continue
            yield perm

# Test

src = list('abcde')
for lst in permutations_with_limited_repetition(src, 3, 2, 1):
    print(''.join(lst))

output

aba
aca
ada
aea
bab
abc
acb
bac
bca
cab
cba
abd
adb
bad
bda
dab
dba
abe
aeb
bae
bea
eab
eba
cac
acd
adc
cad
cda
dac
dca
ace
aec
cae
cea
eac
eca
dad
ade
aed
dae
dea
ead
eda
eae
bcb
bdb
beb
cbc
bcd
bdc
cbd
cdb
dbc
dcb
bce
bec
cbe
ceb
ebc
ecb
dbd
bde
bed
dbe
deb
ebd
edb
ebe
cdc
cec
dcd
cde
ced
dce
dec
ecd
edc
ece
ded
ede

For a commented (non-generator) version of the permutation algorithm, please see this answer I wrote last year.


Update

The first filter

if any(combo.count(c) > maxrepeat for c in combo):

can be made more efficient by using the same groupby technique as the second filter:

if any(len(list(g)) > maxrepeat for _, g in groupby(combo)):

(I should have thought of that yesterday, but I was originally not going to do that second filter, it was a last-minute inspiration).