I have read many posts asking the question on how to convert a C++ std::string
or const std::string&
to a char*
to pass it to a C function and it seems there is quite a few caveat's in regards to doing this. One has to beware about the string being contiguous and a lot of other things. The point is that I've never really understood all the points one needs to be aware of and why?
I wondered if someone could sum up the caveats and downfalls about doing a conversion from a std::string
to a char*
that is needed to pass to a C function?
This when the std::string
is a const
reference and when it's just a non-const reference, and when the C function will alter the char*
and when it will not alter it.
First, whether const reference or value doesn't change anything.
You then have to consider what the function is expecting. There are different things which a function can do with a
char*
or achar const*
---the original versions ofmemcpy
, for example, used these types, and it's possible that there is still such code around. It is, hopefully, rare, and in the following, I will assume that thechar*
in the C function refer to'\0'
terminated strings.If the C function takes a
char const*
, you can pass it the results ofstd::string::c_str()
; if it takes achar*
, it depends. If it takes achar*
simply because it dates from the pre-const
days of C, and in fact, it modifies nothing,std::string::c_str()
followed by aconst_cast
is appropriate. If the C function is using thechar*
as an out parameter, however, things become more difficult. I personally prefer declaring achar[]
buffer, passing this, and then converting the results tostd::string
, but all known implementations ofstd::string
use a contiguous buffer, and the next version of the standard will require it, so correctly dimensioning thestd::string
first (usingstd::string::resize()
, then passing&s[0]
, and afterwards redimensionning the string to the resulting length (determined usingstrlen(s.c_str())
, if necessary) can also be used.Finally (but this is also an issue for C programs using
char[]
), you have to consider any lifetime issues. Most functions takingchar*
orchar const*
simply use the pointer, and forget it, but if the function saves the pointer somewhere, for later use, the string object must live at least as long, and its size should not be modified during that period. (Again, in such cases, I prefer using achar[]
.)std:string can store zero bytes. This means that when passed to C function it can be truncated prematurely, as C functions will stop on first zero byte. This can have security implications, if you try to use C function for example to filter out or escape unwanted characters.
A result of std::string::c_str() will sometimes be invalidated by operations changing a string (non-const member functions). It will cause very hard to diagnose bugs ("Heisenbugs") if you try to use this pointer after you first use c_str() and then modify a string.
Do not use
const_cast
, ever.goto
is less troublesome.[I would add a comment, but I don't have enough rep for that, so sorry for adding (yet) another answer.]
While it is true that the current standard does not guarantee the internal buffer of std::string to be contiguous, it appears that practically all implementations use contiguous buffers. Furthermore, the new C++0x standard (which is about to be approved by ISO) requires contiguous internal buffers in std::string, and even the current C++03 standard requires returning a contiguous buffer when you call data() or &str[0] (though it won't be necessarily null-terminated). See here for more details.
That still doesn't make it safe to write to the string though, since the standard doesn't force implementations to actually return their internal buffer when you call data(), c_str() or operator, and neither are they prevented from using optimizations like copy-on-write, which may complicate things further (it appears that the new C++0x will ban ban copy-on-write though). That being said, if you don't care about maximum portability, you can check your target implementation and see what it actually does inside. AFAIK, Visual C++ 2008/2010 always returns the real internal buffer pointer, and doesn't do copy-on-write (it does have the Small String Optimization, but that's probably not a concern).
Basically, there are three points that are important:
According to the still current standard,
std::string
isn’t actually guaranteed to use contiguous storage (as far as I know this is due to change). But in fact, all current implementations probably use contiguous storage anyway. For that reason,c_str()
(anddata()
) may actually create a copy of the string internally …The pointer returned by
c_str()
(anddata()
) is valid only as long as no non-const methods on the original string are invoked. This makes its use unsuitable when the C function hangs on to the pointer (as opposed to only using it during the duration of the actual function call).If there is any chance at all that the string is going to be modified, casting away constness from the
c_str()
is not a good idea. You must create a buffer with a copy of the string, and pass that into the C function. If you create a buffer, remember to add a null termination.When the C function does not alter the string behind the
char*
, you can usestd::string::c_str()
for both const and non-conststd::string
instances. Ideally it would be aconst char*
, but if it's not (because of a legacy API) you may legally use aconst_cast
. But you may only use the pointer fromc_str()
as long as you're not modifying the string!When the C function does alter the string behind the
char*
, your only safe and portable way to use thestd::string
is to copy it to a temporary buffer yourself (for example fromc_str()
)! Make sure you free the temporary memory afterwards -- or usestd::vector
, which is guaranteed to have continuous memory.