I have a Python script which takes as input a list of integers, which I need to work with four integers at a time. Unfortunately, I don't have control of the input, or I'd have it passed in as a list of four-element tuples. Currently, I'm iterating over it this way:
for i in xrange(0, len(ints), 4):
# dummy op for example code
foo += ints[i] * ints[i + 1] + ints[i + 2] * ints[i + 3]
It looks a lot like "C-think", though, which makes me suspect there's a more pythonic way of dealing with this situation. The list is discarded after iterating, so it needn't be preserved. Perhaps something like this would be better?
while ints:
foo += ints[0] * ints[1] + ints[2] * ints[3]
ints[0:4] = []
Still doesn't quite "feel" right, though. :-/
Related question: How do you split a list into evenly sized chunks in Python?
I needed a solution that would also work with sets and generators. I couldn't come up with anything very short and pretty, but it's quite readable at least.
List:
Set:
Generator:
Yet another answer, the advantages of which are:
1) Easily understandable
2) Works on any iterable, not just sequences (some of the above answers will choke on filehandles)
3) Does not load the chunk into memory all at once
4) Does not make a chunk-long list of references to the same iterator in memory
5) No padding of fill values at the end of the list
That being said, I haven't timed it so it might be slower than some of the more clever methods, and some of the advantages may be irrelevant given the use case.
Update:
A couple of drawbacks due to the fact the inner and outer loops are pulling values from the same iterator:
1) continue doesn't work as expected in the outer loop - it just continues on to the next item rather than skipping a chunk. However, this doesn't seem like a problem as there's nothing to test in the outer loop.
2) break doesn't work as expected in the inner loop - control will wind up in the inner loop again with the next item in the iterator. To skip whole chunks, either wrap the inner iterator (ii above) in a tuple, e.g.
for c in tuple(ii)
, or set a flag and exhaust the iterator.Here is a chunker without imports that supports generators:
Example of use:
There doesn't seem to be a pretty way to do this. Here is a page that has a number of methods, including:
The ideal solution for this problem works with iterators (not just sequences). It should also be fast.
This is the solution provided by the documentation for itertools:
Using ipython's
%timeit
on my mac book air, I get 47.5 us per loop.However, this really doesn't work for me since the results are padded to be even sized groups. A solution without the padding is slightly more complicated. The most naive solution might be:
Simple, but pretty slow: 693 us per loop
The best solution I could come up with uses
islice
for the inner loop:With the same dataset, I get 305 us per loop.
Unable to get a pure solution any faster than that, I provide the following solution with an important caveat: If your input data has instances of
filldata
in it, you could get wrong answer.I really don't like this answer, but it is significantly faster. 124 us per loop
About solution gave by
J.F. Sebastian
here:It's clever, but has one disadvantage - always return tuple. How to get string instead?
Of course you can write
''.join(chunker(...))
, but the temporary tuple is constructed anyway.You can get rid of the temporary tuple by writing own
zip
, like this:Then
Example usage: