问题:

I have found a solution but it is really slow:

def chunks(self,data, SIZE=10000):
    for i in xrange(0, len(data), SIZE):
        yield dict(data.items()[i:i+SIZE])

Do you have any ideas without using external modules (numpy and etc.)

回答1:

Since the dictionary is so big, it would be better to keep all the items involved to be just iterators and generators, like this

from itertools import islice

def chunks(data, SIZE=10000):
    it = iter(data)
    for i in xrange(0, len(data), SIZE):
        yield {k:data[k] for k in islice(it, SIZE)}

Sample run:

for item in chunks({i:i for i in xrange(10)}, 3):
    print item

Output

{0: 0, 1: 1, 2: 2}
{3: 3, 4: 4, 5: 5}
{8: 8, 6: 6, 7: 7}
{9: 9}

回答2:

Another method is iterators zipping:

>>> from itertools import izip_longest, ifilter
>>> d = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5, 'f':6, 'g':7, 'h':8}

Create a list with copies of dict iterators (number of copies is number of elements in result dicts). By passing each iterator from chunks list to izip_longest you will get needed number of elements from source dict (ifilter used to remove None from zip results). With generator expression you can lower memory usage:

>>> chunks = [d.iteritems()]*3
>>> g = (dict(ifilter(None, v)) for v in izip_longest(*chunks))
>>> list(g)
[{'a': 1, 'c': 3, 'b': 2},
 {'e': 5, 'd': 4, 'g': 7},
 {'h': 8, 'f': 6}]