I'd like to know if there is a simple (or already created) way of doing the opposite of this: Generate List of Numbers from Hyphenated.... This link could be used to do:
>> list(hyphen_range('1-9,12,15-20,23'))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 15, 16, 17, 18, 19, 20, 23]:
I'm looking to do the opposite (note that 10 and 21 are included so it would be compatible with the range function, where range(1,10)=[1,2,3,4,5,6,7,8,9]):
>> list_to_ranges([1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 15, 16, 17, 18, 19, 20, 23])
'1-10,12,15-21,23'
Eventually, I would like to have the output also incorporate a step where the last number of the output indicates the step:
>> list_to_ranges([1, 3, 5, 7, 8, 9, 10, 11])
'1-13:2,8,10'
Essentially, this would end up being kind of like an "inverse" range function
>> tmp = list_to_ranges([1, 3, 5])
>> print tmp
'1-7:2'
>> range(1, 7, 2)
[1, 3, 5]
My guess is that there is no really easy/simple way to do this, but I thought I would ask on here before I go make some brute force, long method.
EDIT
Using the code from an answer to this post as an example, I came up with a simple way to do the first part. But I think that identifying the patterns to do steps would be a bit harder.
from itertools import groupby
from operator import itemgetter
data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
print data, '\n'
str_list = []
for k, g in groupby(enumerate(data), lambda (i,x):i-x):
ilist = map(itemgetter(1), g)
print ilist
if len(ilist) > 1:
str_list.append('%d-%d' % (ilist[0], ilist[-1]+1))
else:
str_list.append('%d' % ilist[0])
print '\n', ','.join(str_list)
EDIT 2
Here is my attempt at including the step size...it is pretty close, but the first numbers get repeated. I think that with a little bit of tweaking of this, it will be close to what I want - or at least good enough.
import numpy as np
from itertools import groupby
def list_to_ranges(data):
data = sorted(data)
diff_data = np.diff(data).tolist()
ranges = []
i = 0
for k, iterable in groupby(diff_data, None):
rng = list(iterable)
step = rng[0]
if len(rng) == 1:
ranges.append('%d' % data[i])
elif step == 1:
ranges.append('%d-%d' % (data[i], data[i+len(rng)]+step))
else:
ranges.append('%d-%d:%d' % (data[i], data[i+len(rng)]+step, step))
i += len(rng)
return ','.join(ranges)
data = [1, 3, 5, 6, 7, 11, 13, 15, 16, 17, 18, 19, 22, 25, 28]
print data
data_str = list_to_ranges(data)
print data_str
_list = []
for r in data_str.replace('-',':').split(','):
r = [int(a) for a in r.split(':')]
if len(r) == 1:
_list.extend(r)
elif len(r) == 2:
_list.extend(range(r[0], r[1]))
else:
_list.extend(range(r[0], r[1], r[2]))
print _list
print list(set(_list))
One approach could be "eating" piece by piece the input sequence and store the partial range results untill you've got them all:
I test it with a bunch of unit tests and it passed them all, it can handle negative numbers too, but they'll look kind of ugly (it's really anybody's fault).
Example:
Note: I wrote the code for Python 3.
Performance
I didn't put any performance effort in the solution above. In particular, every time a list get re-builded with slicing, it might take some time if the input list has a particular shape. So, the first simple improvement would be using
itertools.islice()
where possible.Anyway here's another implementation of the same algorithm, that scan through the input list with a
scan
index instead of slicing:I stopped working on it once it got ~65% faster than the previous top solution, it seemed enough :)
Anyway I'd say that there might still be room for improvement (expecially in the middle for-loop).
This is most likely what you are looking for.
Edit: I see you already found the post. My apologies.
To help with the second part, I've tinkered a bit myself. This is what I came up with:
For the given input list, this gives:
['1-7:2', '8-11:1', '13-17:2']
. The code could do with a bit of cleanup, but this sorts with your problem assuming the grouping can be done sequentially.This function should do what you need without requiring any imports.
This is a comparison of the 3 methods. Change the amount of data and the density via the values below...no matter what values I use, the first solution seems to be the quickest for me. For very large sets of data, the third solution becomes very slow.
EDITED
Edited to include comments below and add in a new solution. The last solution seems to be the quickest now.