fastest method to dump numpy array into string

2019-05-12 05:39发布

I need to organized a data file with chunks of named data. Data is NUMPY arrays. But I don't want to use numpy.save or numpy.savez function, because in some cases, data have to be sent on a server over a pipe or other interface. So I want to dump numpy array into memory, zip it, and then, send it into a server.

I've tried simple pickle, like this:

try:
    import cPickle as pkl
except:
    import pickle as pkl
import ziplib
import numpy as np

def send_to_db(data, compress=5):
     send( zlib.compress(pkl.dumps(data),compress) )

.. but this is extremely slow process.

Even with compress level 0 (without compression), the process is very slow and just because of pickling.

Is there any way to dump numpy array into string without pickle? I know that numpy allows to get buffer numpy.getbuffer, but it isn't obvious to me, how to use this dumped buffer to obtaine an array back.

2条回答
唯我独甜
2楼-- · 2019-05-12 06:13

THe default pickle method provides a pure ascii output. To get (much) better performance, use the latest version available. Versions 2 and above are binary and, if memory serves me right, allows numpy arrays to dump their buffer directly into the stream without addtional operations.

To select version to use, add the optional argument while pickling (no need to specify it while unpickling), for instance pkl.dumps(data, 2). To pick the latest possible version, use pkl.dumps(data, -1)

Note that if you use different python versions, you need to specify the lowest supported version. See Pickle documentation for details on the different versions

查看更多
【Aperson】
3楼-- · 2019-05-12 06:31

You should definitely use numpy.save, you can still do it in-memory:

>>> import io
>>> import numpy as np
>>> import zlib
>>> f = io.BytesIO()
>>> arr = np.random.rand(100, 100)
>>> np.save(f, arr)
>>> compressed = zlib.compress(f.getvalue())

And to decompress, reverse the process:

>>> np.load(io.BytesIO(zlib.decompress(compressed)))
array([[ 0.80881898,  0.50553303,  0.03859795, ...,  0.05850996,
         0.9174782 ,  0.48671767],
       [ 0.79715979,  0.81465744,  0.93529834, ...,  0.53577085,
         0.59098735,  0.22716425],
       [ 0.49570713,  0.09599001,  0.74023709, ...,  0.85172897,
         0.05066641,  0.10364143],
       ...,
       [ 0.89720137,  0.60616688,  0.62966729, ...,  0.6206728 ,
         0.96160519,  0.69746633],
       [ 0.59276237,  0.71586014,  0.35959289, ...,  0.46977027,
         0.46586237,  0.10949621],
       [ 0.8075795 ,  0.70107856,  0.81389246, ...,  0.92068768,
         0.38013495,  0.21489793]])
>>>

Which, as you can see, matches what we saved earlier:

>>> arr
array([[ 0.80881898,  0.50553303,  0.03859795, ...,  0.05850996,
         0.9174782 ,  0.48671767],
       [ 0.79715979,  0.81465744,  0.93529834, ...,  0.53577085,
         0.59098735,  0.22716425],
       [ 0.49570713,  0.09599001,  0.74023709, ...,  0.85172897,
         0.05066641,  0.10364143],
       ...,
       [ 0.89720137,  0.60616688,  0.62966729, ...,  0.6206728 ,
         0.96160519,  0.69746633],
       [ 0.59276237,  0.71586014,  0.35959289, ...,  0.46977027,
         0.46586237,  0.10949621],
       [ 0.8075795 ,  0.70107856,  0.81389246, ...,  0.92068768,
         0.38013495,  0.21489793]])
>>>
查看更多
登录 后发表回答