I have found a solution but it is really slow:
def chunks(self,data, SIZE=10000):
for i in xrange(0, len(data), SIZE):
yield dict(data.items()[i:i+SIZE])
Do you have any ideas without using external modules (numpy and etc.)
I have found a solution but it is really slow:
def chunks(self,data, SIZE=10000):
for i in xrange(0, len(data), SIZE):
yield dict(data.items()[i:i+SIZE])
Do you have any ideas without using external modules (numpy and etc.)
Since the dictionary is so big, it would be better to keep all the items involved to be just iterators and generators, like this
from itertools import islice
def chunks(data, SIZE=10000):
it = iter(data)
for i in xrange(0, len(data), SIZE):
yield {k:data[k] for k in islice(it, SIZE)}
Sample run:
for item in chunks({i:i for i in xrange(10)}, 3):
print item
Output
{0: 0, 1: 1, 2: 2}
{3: 3, 4: 4, 5: 5}
{8: 8, 6: 6, 7: 7}
{9: 9}
Another method is iterators zipping:
>>> from itertools import izip_longest, ifilter
>>> d = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5, 'f':6, 'g':7, 'h':8}
Create a list with copies of dict iterators (number of copies is number of elements in result dicts). By passing each iterator from chunks
list to izip_longest
you will get needed number of elements from source dict (ifilter
used to remove None
from zip results). With generator expression you can lower memory usage:
>>> chunks = [d.iteritems()]*3
>>> g = (dict(ifilter(None, v)) for v in izip_longest(*chunks))
>>> list(g)
[{'a': 1, 'c': 3, 'b': 2},
{'e': 5, 'd': 4, 'g': 7},
{'h': 8, 'f': 6}]