Is an explicit NUL-byte necessary at the end of a

2019-08-01 22:46发布

问题:

When converting a bytearray-object (or a bytes-object for that matter) to a C-string, the cython-documentation recommends to use the following:

cdef char * cstr = py_bytearray

there is no overhead, as cstr is pointing to the buffer of the bytearray-object.

However, C-strings are null-terminated and thus in order to be able to pass cstr to a C-function it must also be null-terminated. The cython-documentation doesn't provide any information, whether the resulting C-strings are null-terminated.

It is possible to add a NUL-byte explicitly to the byarray-object, e.g. by using b'text\x00' instead of just `b'text'. Yet this is cumbersome, easy to forget, and there is at least experimental evidence, that the explicit NUL-byte is not needed:

%%cython
from libc.stdio cimport printf
def printit(py_bytearray):
    cdef char *ptr = py_bytearray
    printf("%s\n", ptr)

And now

printit(bytearray(b'text'))

prints the desired "text" to stdout (which, in the case an IPython-notebook, is obviously not the output shown in the browser).

But is this a lucky coincidence or is there a guarantee, that the buffer of a bytearray-object (or a bytes-object) is null-terminated?

回答1:

I think it's safe (at least in Python 3), however I'd be a bit wary.

Cython uses the C-API function PyByteArray_AsString. The Python3 documentation for it says "The returned array always has an extra null byte appended." The Python2 version does not have that note so it's difficult to be sure if it's safe.

Practically speaking, I think Python deals with this by always over-allocating bytearrays by one and NULL terminating them (see source code for one example of where this is done).

The only reason to be a bit cautious is that it's perfectly acceptable for bytearrays (and Python strings for that matter) to contain a 0 byte within the string, so it isn't a good indicator of where the end is. Therefore, you should really be using their len anyway. (This is a weak argument though, especially since you're probably the one initializing them, so you know if this should be true)


(My initial version of this answer had something about _PyByteArray_empty_string. @ead pointed out in the comments that I was mistaken about this and hence it's edited out...)