How does Python 3 know how to pickle extension typ

2019-04-11 03:00发布

问题:

Numpy arrays, being extension types (aka defined using in extensions the C API), declare additional fields outside the scope of the Python interpreter (for example the data attribute, which is a Buffer Structure, as documented in Numpy's array interface.
To be able to serialize it, Python 2 used to use the __reduce__ function as part of the pickle protocol, as stated in the doc, and explained here.

But, even if __reduce__ still exists in Python 3, the Pickle protocol section (and Pickling and unpickling extension types a fortiori) was removed from the doc, so it is unclear what does what.
Moreover, there are additional entries that relate to pickling extension types:

  • copyreg, described as a Pickle interface constructor registration for extension types, but there's no mention of extension types in the copyreg module itself.
  • PEP 3118 -- Revising the buffer protocol which released a new buffer protocol for Python 3. (and maybe automates pickling for this buffer protocol).
  • New-style class: One can assume that the new-style classes have an influence on the pickling process.

So, how does all of this relate to Numpy arrays:

  1. Does Numpy array implement special methods, such as __reduce__ to inform Python on how to pickle them (or copyreg)? Numpy objects still expose a __reduce__ method, but it may be for compatibility reasons.
  2. Does Numpy uses Python's C-API structures that are supported out of the box by Pickle (like the new buffer protocol), so nothing supplementary is necessary in order to pickle numpy arrays?

回答1:

Python 3 pickle still supports __reduce__, it is covered under the Pickling Class Instances section.

Numpy's support has not changed in this regard; it implements __reduce__ on arrays to support pickling in either Python 2 or 3:

>>> import numpy
>>> numpy.array(0).__reduce__()
(<built-in function _reconstruct>, (<class 'numpy.ndarray'>, (0,), b'b'), (1, (), dtype('int64'), False, b'\x00\x00\x00\x00\x00\x00\x00\x00'))

A three-element tuple is returned, consisting of a function object to recreate the value, a tuple of arguments for that function, and a state tuple to pass no newinstance.__setstate__().