When converting a bytearray
-object (or a bytes
-object for that matter) to a C-string, the cython-documentation recommends to use the following:
cdef char * cstr = py_bytearray
there is no overhead, as cstr
is pointing to the buffer of the bytearray
-object.
However, C-strings are null-terminated and thus in order to be able to pass cstr
to a C-function it must also be null-terminated. The cython-documentation doesn't provide any information, whether the resulting C-strings are null-terminated.
It is possible to add a NUL
-byte explicitly to the byarray
-object, e.g. by using b'text\x00'
instead of just `b'text'. Yet this is cumbersome, easy to forget, and there is at least experimental evidence, that the explicit NUL-byte is not needed:
%%cython
from libc.stdio cimport printf
def printit(py_bytearray):
cdef char *ptr = py_bytearray
printf("%s\n", ptr)
And now
printit(bytearray(b'text'))
prints the desired "text" to stdout (which, in the case an IPython-notebook, is obviously not the output shown in the browser).
But is this a lucky coincidence or is there a guarantee, that the buffer of a bytearray-object (or a bytes-object) is null-terminated?
I think it's safe (at least in Python 3), however I'd be a bit wary.
Cython uses the C-API function
PyByteArray_AsString
. The Python3 documentation for it says "The returned array always has an extra null byte appended." The Python2 version does not have that note so it's difficult to be sure if it's safe.Practically speaking, I think Python deals with this by always over-allocating bytearrays by one and NULL terminating them (see source code for one example of where this is done).
The only reason to be a bit cautious is that it's perfectly acceptable for bytearrays (and Python strings for that matter) to contain a 0 byte within the string, so it isn't a good indicator of where the end is. Therefore, you should really be using their
len
anyway. (This is a weak argument though, especially since you're probably the one initializing them, so you know if this should be true)(My initial version of this answer had something about
_PyByteArray_empty_string
. @ead pointed out in the comments that I was mistaken about this and hence it's edited out...)