By default, pickling a numpy view array loses the view relationship, even if the array base is pickled too. My situation is that I have some complex container objects which are pickled. And in some cases, some contained data are views in some others. Saving a independent array of each view is not only a loss of space but also, the reloaded data have lost the view relationship.
A simple example would be (but in my case the container are more complex than a dictionary):
import numpy as np
import cPickle
tmp = np.zeros(2)
d1 = dict(a=tmp,b=tmp[:]) # d1 to be saved: b is a view on a
pickled = cPickle.dumps(d1)
d2 = cPickle.loads(pickled) # d2 reloaded copy of d1 container
print 'd1 before:', d1
d1['b'][:] = 1
print 'd1 after: ', d1
print 'd2 before:', d2
d2['b'][:] = 1
print 'd2 after: ', d2
which would print:
d1 before: {'a': array([ 0., 0.]), 'b': array([ 0., 0.])}
d1 after: {'a': array([ 1., 1.]), 'b': array([ 1., 1.])}
d2 before: {'a': array([ 0., 0.]), 'b': array([ 0., 0.])}
d2 after: {'a': array([ 0., 0.]), 'b': array([ 1., 1.])} # not a view anymore
My question:
(1) Is there a way to preserve it? (2) (even better) is there a way to do it only if the base is pickled
For the (1) I think there may be some way by changing the __setstate__
, __reduce_ex_
, etc... of the view array. But I don't fill confident with these for now. For the (2) I have no idea.
This isn't done in NumPy proper, because it doesn't always make sense to pickle the base array, and pickle does not expose the ability to check if another object is also being pickled as part of its API.
But this sort of check can be done in a custom container for NumPy arrays. For example:
This results in significant space savings: