Python 2.7 has introduced a new API for buffers and memoryview objects.
I read the documentation on them and I think I got the basic concept (accessing the internal data of an object in a raw form without copying it, which I suppose means a "faster and less memory-hungry" way to get object data), but to really understand the documentation, the reader should have a knowledge of C that is beyond the one I have.
I would be very grateful if somebody would take the time to:
- explain buffers and memoryview objects in "layman terms" and
- describe a scenario in which using buffers and memoryview objects would be "the Pythonic way" of doing things
Here's a line from a hash function I wrote:
M = tuple(buffer(M, i, Nb) for i in range(0, len(M), Nb))
This will split a long string, M, into shorter 'strings' of length Nb, where Nb is the number of bytes / characters I can handle at a time. It does this WITHOUT copying any parts of the string, as would happen if I made slices of the string like so:
M = tuple(M[i*Nb:i*Nb+Nb] for i in range(0, len(M), Nb))
I can now iterate over M just as I would had I sliced it:
H = key
for Mi in M:
H = encrypt(H, Mi)
Basically, buffers and memoryviews are efficient ways to deal with the immutability of strings in Python, and the general copying behavior of slicing etc. A memoryview is just like a buffer, except you can also write to it, not just read.
While the main buffer / memoryview doc is about the C implementation, the standard types page has a bit of info under memoryview: http://docs.python.org/library/stdtypes.html#memoryview-type
Edit: Found this in my bookmarks, http://webcache.googleusercontent.com/search?q=cache:Ago7BXl1_qUJ:mattgattis.com/2010/3/9/python-memory-views+site:mattgattis.com+python&hl=en&client=firefox-a&gl=us&strip=1 is a REALLY good brief writeup.
Edit 2: Turns out I got that link from When should a memoryview be used? in the first place, that question was never answered in detail and the link was dead, so hopefully this helps.
Part of the answer I was looking for is that buffer
is the "old way", that memoryview
is the new way, but was backported to 2.7 - see the archived blog here
This doesn't answer my question of why the C API I thought I implemented in 2.7 lets me construct a buffer
but not a memoryview
...
To get memoryview
to work in Python 2.7, you need to have the Py_TPFLAGS_HAVE_NEWBUFFER
flag set in tp_flags
. I found that the built-in bytearray
source was a good reference; it is in Include/bytearrayobject.h
and Objects/bytearrayobject.c
.