I need to organized a data file with chunks of named data. Data is NUMPY arrays. But I don't want to use numpy.save or numpy.savez function, because in some cases, data have to be sent on a server over a pipe or other interface. So I want to dump numpy array into memory, zip it, and then, send it into a server.
I've tried simple pickle, like this:
try:
import cPickle as pkl
except:
import pickle as pkl
import ziplib
import numpy as np
def send_to_db(data, compress=5):
send( zlib.compress(pkl.dumps(data),compress) )
.. but this is extremely slow process.
Even with compress level 0 (without compression), the process is very slow and just because of pickling.
Is there any way to dump numpy array into string without pickle? I know that numpy allows to get buffer numpy.getbuffer, but it isn't obvious to me, how to use this dumped buffer to obtaine an array back.
THe default pickle method provides a pure ascii output. To get (much) better performance, use the latest version available. Versions 2 and above are binary and, if memory serves me right, allows numpy arrays to dump their buffer directly into the stream without addtional operations.
To select version to use, add the optional argument while pickling (no need to specify it while unpickling), for instance
pkl.dumps(data, 2)
. To pick the latest possible version, usepkl.dumps(data, -1)
Note that if you use different python versions, you need to specify the lowest supported version. See Pickle documentation for details on the different versions
You should definitely use
numpy.save
, you can still do it in-memory:And to decompress, reverse the process:
Which, as you can see, matches what we saved earlier: