Pickle dump huge file without memory error

I have a program where I basically adjust the probability of certain things happening based on what is already known. My file of data is already saved as a pickle Dictionary object at Dictionary.txt. The problem is, is that everytime that I run the program it pulls in the Dictionary.txt, turns it into a dictionary object, makes it's edits and overwrites Dictionary.txt. This is pretty memory intensive as the Dictionary.txt is 123 MB. When I dump I am getting the MemoryError, everything seems fine when I pull it in..

Is there a better (more efficient) way of doing the edits? (Perhaps w/o having to overwrite the entire file everytime)
Is there a way that I can invoke garbage collection (through gc module)? (I already have it auto-enabled via gc.enable())
I know that besides readlines() you can read line-by-line. Is there a way to edit the dictionary incrementally line-by-line when I already have a fully completed Dictionary object File in the program.
Any other solutions?

Thank you for your time.

标签： python memory file-io pickle

9条回答

Viruses.

2楼-- · 2019-03-11 02:48

I recently had this problem. After trying cpickle with ASCII and the binary protocol 2, I found that my SVM from sci-kit learn trained on 20+ gb of data was not pickling due to a memory error. However, the dill package seemed to resolve the issue. Dill will not create many improvements for a dictionary but may help with streaming. It is meant to stream pickled bytes across a network.

import dill

with open(path,'wb') as fp:
    dill.dump(outpath,fp)
    dill.load(fp)

If efficiency is an issue, try loading/saving to a database. In this instance, your storage solution may be an issue. At 123 mb Pandas should be fine. However, if the machine has limited memory SQL offers fast,optimized, bag operations over data, usually with multithreaded support. My poly kernel svm saved.

0人赞添加讨论(0) 举报

再贱就再见

3楼-- · 2019-03-11 02:49

This may seem trivial, but try to use the 64bit Python if you are not.

0人赞添加讨论(0) 举报

可以哭但决不认输i

4楼-- · 2019-03-11 02:56

I was having the same issue. I use joblib and work was done. In case if someone wants to know other possibilities.

save the model to disk

from sklearn.externals import joblib
filename = 'finalized_model.sav'
joblib.dump(model, filename)

some time later... load the model from disk

loaded_model = joblib.load(filename)
result = loaded_model.score(X_test, Y_test) 

print(result)

0人赞添加讨论(0) 举报

唯我独甜

5楼-- · 2019-03-11 03:01

If your key and values are string, you can use one of the embedded persistent key-value storage engines available in Python standard library. Example from the anydbm module docs:

import anydbm

# Open database, creating it if necessary.
db = anydbm.open('cache', 'c')

# Record some values
db['www.python.org'] = 'Python Website'
db['www.cnn.com'] = 'Cable News Network'

# Loop through contents.  Other dictionary methods
# such as .keys(), .values() also work.
for k, v in db.iteritems():
    print k, '\t', v

# Storing a non-string key or value will raise an exception (most
# likely a TypeError).
db['www.yahoo.com'] = 4

# Close when done.
db.close()

0人赞添加讨论(0) 举报

来，给爷笑一个

6楼-- · 2019-03-11 03:01

Have you tried using streaming pickle: https://code.google.com/p/streaming-pickle/

I have just solved a similar memory error by switching to streaming pickle.

0人赞添加讨论(0) 举报

倾城　Initia

7楼-- · 2019-03-11 03:03

How about this?

import cPickle as pickle
p = pickle.Pickler(open("temp.p","wb")) 
p.fast = True 
p.dump(d) # d could be your dictionary or any file

0人赞添加讨论(0) 举报

1 2 下一页

Pickle dump huge file without memory error

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间