My code is like this:
string s = "abc";
char* pc = const_cast<char*>( s.c_str() );
pc[ 1 ] = 'x';
cout << s << endl;
When I compiled the snippet above using GCC, I got the result "axc" as expected. My question is, is that safe and portable to modify the underlying char
array of a C++ string in this way? Or there might be alternative approaches to manipulate string's data directly?
FYI, my intention is to write some pure C functions that could be called both by C and C++, therefore, they can only accept char*
as arguments. From char*
to string, I know there is copying involved, the penalty is unfavorable. So, could anybody give some suggestions to deal with this sort of situation.
As others said, it is not portable. But there are more dangers. Some std::string implementations (I know that GCC does it) use COW (copy on write).
This is relying on undefined behaviour, and is therefore not portable.
You should not mess with the underlying string. At the end of the day, string is an object, would you mess with any other objects this way?
Have you profiled your code to see if there is a penalty.
This would depend on your operating system. In GNU libc library,
std::string
is implemented using a copy-on-write (CoW) pattern. Thus, if multiplestd::string
objects initially contain the same content, they will internally all point to the same data. Thus, if you modify any of them in the method you show in your question, the content of all of the (seemingly) unrelatedstd::string
objects will change.On Windows, I think the implementation doesn't use CoW, I'm not sure what would happen there.
Anyway, it's undefined behavior, so I'd stay clear of it. Chances are, even if you get it working, you'll eventually start running into very hard-to-trace bugs.
(a) This is not necessarily the underlying string.
std::string::c_str()
should be a copy of the underlying string (though a bug in the C++ Standard means that, actually, it's often not... I believe that this is fixed in C++0x).(b)
const_cast
ing away the constness only hacks the variable type: the actual object is stillconst
, and your modifying it is Undefined Behaviour — very bad.Simply speaking, do not do this.
Can you use
&myString[0]
at all? It has a non-const version; then again, it's stated to be the same asdata()[0]
which has no non-const version. Someone with a decent library reference to hand can clear this up.To the first part,
c_str()
returnsconst char*
and it means what it says. All theconst_cast
achieves in this case is that your undefined behavior compiles.To the second part, in C++0x
std::string
is guaranteed to have contiguous storage, just likestd::vector
in C++03. Therefore you could use&s[0]
to get achar*
to pass to your functions, as long as the string isn't empty. In practice, allstring
implementations currently in active development already have contiguous storage: there was a straw poll at a standard committee meeting and nobody offered a counter-example. So you can use this feature now if you like.However,
std::string
uses a fundamentally different string format from C-style strings, namely it's data+length rather than nul-terminated. If you modify the string data from your C functions, then you can't change the length of the string and you can't be sure there's a nul byte at the end withoutc_str()
. Andstd::string
can contain embedded nuls which are part of the data, so even if you did find a nul, without knowing the length you still don't know that you've found the end of the string. You're very limited what you can do in functions that will operate correctly on both different kinds of data.