Why pickle.dump(obj) has different size with sys.g

2019-08-30 11:20发布

I use classifier of random forest from scikit lib of python to do my exercise. The result changes each running time. So I run 1000 times and get the average result.

I save object rf into files to predict later by pickle.dump() and get about 4MB each file. However, sys.getsizeof(rf) give me just 36 bytes

rf = RandomForestClassifier(n_estimators = 50)
rf.fit(matX, vecY)
pickle.dump(rf,'var.sav')

My questions:

sys.getsizeof() seems to be wrong in getting size of RandomForestClassifier object, doesn't it? why?
How to save object in zip file so that it has smaller size?

标签： python random-forest

1条回答

你好瞎i

2楼-- · 2019-08-30 11:41

getsizeof() gives you the memory footprint of just the object, and not of any other values referenced by that object. You'd need to recurse over the object to find the total size of all attributes too, and anything those attributes hold, etc.

Pickling is a serialization format. Serialization needs to store metadata as well as the contents of the object. Memory size and pickle size only have a rough correlation.

Pickles are byte streams, if you need to have a more compact bytestream, use compression.

If you are storing your pickles in a ZIP file, your data will already be compressed; compressing the pickle before storing it in the ZIP will not help in that case as already compressed data runs the risk to become bigger after additional ZIP compression instead due to metadata overhead and lack of duplicate data in typical compressed data.

0人赞添加讨论(0) 举报

Why pickle.dump(obj) has different size with sys.g

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间