I am trying to apply _pickle
to save data onto disk. But when calling _pickle.dump
, I got an error
OverflowError: cannot serialize a bytes object larger than 4 GiB
Is this a hard limitation to use _pickle
? (cPickle
for python2)
I am trying to apply _pickle
to save data onto disk. But when calling _pickle.dump
, I got an error
OverflowError: cannot serialize a bytes object larger than 4 GiB
Is this a hard limitation to use _pickle
? (cPickle
for python2)
There is a great answers above for why pickle doesn't work. But it still doesn't work for Python 2.7, which is a problem if you are are still at Python 2.7 and want to support large files, especially NumPy (NumPy arrays over 4G fail).
You can use OC serialization, which has been updated to work for data over 4Gig. There is a Python C Extension module available from:
http://www.picklingtools.com/Downloads
Take a look at the Documentation:
http://www.picklingtools.com/html/faq.html#python-c-extension-modules-new-as-of-picklingtools-1-6-0-and-1-3-3
But, here's a quick summary: there's ocdumps and ocloads, very much like pickle's dumps and loads::
The OC Serialization is 1.5-2x faster and also works with C++ (if you are mixing langauges). It works with all built-in types, but not classes (partly because it is cross-language and it's hard to build C++ classes from Python).
Yes, this is a hard-coded limit; from
save_bytes
function:The protocol uses 4 bytes to write the size of the object to disk, which means you can only track sizes of up to 232 == 4GB.
If you can break up the
bytes
object into multiple objects, each smaller than 4GB, you can still save the data to a pickle, of course.Not anymore in Python 3.4 which has PEP 3154 and Pickle 4.0
https://www.python.org/dev/peps/pep-3154/
But you need to say you want to use version 4 of the protocol:
https://docs.python.org/3/library/pickle.html