I have a more basic Run Length Encoding question compared to many of the questions about this topic that have already been answered. Essentially, I'm trying to take the string
string = 'aabccccaaa'
and have it return
a2b1c4a3
I thought that if I can manage to get all the information into a list like I have illustrated below, I would easily be able to return a2b1c4a3
test = [['a','a'], ['b'], ['c','c','c','c'], ['a','a','a']]
I came up with the following code so far, but was wondering if someone would be able to help me figure out how to make it create the output I illustrated above.
def string_compression():
for i in xrange(len(string)):
prev_item, current_item = string[i-1], string[i]
print prev_item, current_item
if prev_item == current_item:
<HELP>
If anyone has any additional comments regarding more efficient ways to go about solving a question like this I am all ears!
You can use itertools.groupby()
:
from itertools import groupby
grouped = [list(g) for k, g in groupby(string)]
This will produce your per-letter groups as a list of lists.
You can turn that into a RLE in one step:
rle = ''.join(['{}{}'.format(k, sum(1 for _ in g)) for k, g in groupby(string)])
Each k
is the letter being grouped, each g
an iterator producing N times the same letter; the sum(1 for _ in g)
expression counts those in the most efficient way possible.
Demo:
>>> from itertools import groupby
>>> string = 'aabccccaaa'
>>> [list(g) for k, g in groupby(string)]
[['a', 'a'], ['b'], ['c', 'c', 'c', 'c'], ['a', 'a', 'a']]
>>> ''.join(['{}{}'.format(k, sum(1 for _ in g)) for k, g in groupby(string)])
'a2b1c4a3'
Consider using the more_itertools.run_length
tool.
Demo
import more_itertools as mit
iterable = "aabccccaaa"
list(mit.run_length.encode(iterable))
# [('a', 2), ('b', 1), ('c', 4), ('a', 3)]
Code
"".join(x[0] + str(x[1]) for x in mit.run_length.encode(iterable))
# 'a2b1c4a3'
Alternative itertools/functional style:
"".join(map(str, it.chain.from_iterable(x for x in mit.run_length.encode(iterable))))
# 'a2b1c4a3'
Note: more_itertools
is a third-party library that installable via pip install more_itertools
.