I have created a script that uses the following code to iterate over all combinations of characters in the sCharacters string:
sCharacters = "abcdefghijklmnopqrstuvwxyz0123456789"
iKeyLength = len(sCharacters)
for sCharacterCombination in itertools.product(sCharacters, repeat=iKeyLength):
# Do Stuff Here
However I am only interested in combinations where no single character is represented more than n number of times in the sCharacterCombination.
Eg; I want to filter out strings like
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab
and only get ones like
7efad3ca411bf57f1df57c0b4e9282aa
I tried just checking each sCharacterCombination
, but this is no good as I still have to iterate over a mountain of items I will never use.
How can I get the iterator to create the list based on each item having no single character represented more than n
number of times in the first place, so I don't have to iterate over items I will not use?
It would be awesome if I could say:
- Max number of times a single character could be represented in the sCharacterCombination.
- Max times a single character could be represented in a row.
That would allow me to say a single character could appear in the sCharacterCombination a max of four times, but it could not be in a row more than twice. E.g. This is ok 1121...
but this is not 1112...
.
Thank you for your time.
Here's some code that's more efficient than your current approach.
First, we use itertool's combinations_with_replacement
function to create combinations of the desired length, filtering out combinations that have more than the desired number of repetitions. Then we permute each combination; the permutation algorithm I use (created by the 14th century Indian mathematician Narayana Pandita) handles repetitions properly, unlike the one in itertools. Then we use itertool's groupby to filter out permutations that contain runs of identical characters that are greater than the permitted run length.
I've included two functions: permutations_with_limited_repetition
limits the length of runs of identical characters; permutations_with_repetition
does not.
Note that the input sequence must be sorted from low to high or this algorithm will not function correctly.
from itertools import combinations_with_replacement, groupby
def lexico_permute(a):
a = list(a)
yield a
n = len(a) - 1
while True:
for j in range(n-1, -1, -1):
if a[j] < a[j + 1]:
break
else:
return
v = a[j]
for k in range(n, j, -1):
if v < a[k]:
break
a[j], a[k] = a[k], a[j]
a[j+1:] = a[j+1:][::-1]
yield a
def permutations_with_repetition(seq, length, maxrepeat):
for combo in combinations_with_replacement(seq, length):
if any(combo.count(c) > maxrepeat for c in combo):
continue
yield from lexico_permute(combo)
def permutations_with_limited_repetition(seq, length, maxrepeat, maxrun):
for combo in combinations_with_replacement(seq, length):
if any(combo.count(c) > maxrepeat for c in combo):
continue
for perm in lexico_permute(combo):
if any(len(list(g)) > maxrun for _, g in groupby(perm)):
continue
yield perm
# Test
src = list('abcde')
for lst in permutations_with_limited_repetition(src, 3, 2, 1):
print(''.join(lst))
output
aba
aca
ada
aea
bab
abc
acb
bac
bca
cab
cba
abd
adb
bad
bda
dab
dba
abe
aeb
bae
bea
eab
eba
cac
acd
adc
cad
cda
dac
dca
ace
aec
cae
cea
eac
eca
dad
ade
aed
dae
dea
ead
eda
eae
bcb
bdb
beb
cbc
bcd
bdc
cbd
cdb
dbc
dcb
bce
bec
cbe
ceb
ebc
ecb
dbd
bde
bed
dbe
deb
ebd
edb
ebe
cdc
cec
dcd
cde
ced
dce
dec
ecd
edc
ece
ded
ede
For a commented (non-generator) version of the permutation algorithm, please see this answer I wrote last year.
Update
The first filter
if any(combo.count(c) > maxrepeat for c in combo):
can be made more efficient by using the same groupby
technique as the second filter:
if any(len(list(g)) > maxrepeat for _, g in groupby(combo)):
(I should have thought of that yesterday, but I was originally not going to do that second filter, it was a last-minute inspiration).