I have a pickled filed. Its size is 9.3MB.
-rw-r--r-- 1 ankit ankit 9.3M Jan 7 17:43 agg_397127.pkl
I load it in python using cPickle. I tried to ascertain its size using pympler asizeof. But there is a considerable difference size given by asize of and sys.getsizeof
from pympler import asizeof
import cPickle as pickle
path = "agg_397127.pkl"
temp = pickle.load(open(path, 'rb'))
temp
{397127: RandomForestRegressor(bootstrap=True, criterion='band_predict',
max_depth=None, max_features='auto', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=1000, n_jobs=1,
oob_score=False, random_state=0, verbose=0, warm_start=False)}
asizeof.asizeof(temp)
1328504
asizeof.flatsize(temp)
import sys
sys.getsizeof(temp)
280
Can someone explain why there is such a difference ?
sys.getsizeof()
returns the size of the object passed to it - which is a dictionary with one entry, in your example. It does NOT include the size of the complex class instance referred to by the dictionary, nor any of the objects referred to by that instance. ANY dictionary with only a few entries (up to 5, on my Python version) would return exactly the same number.The
assizeof
module you're using attempts to recursively add up the sizes of all these referred objects. It doesn't seem to have done a very good job in this case, considering the huge discrepancy between the size returned and the pickle size (but note that these numbers would never be exactly equal, since the format of a pickle on disk is necessarily different than the format of the actual objects in memory).