I have read several places that the difference between c_str()
and data()
(in STL and other implementations) is that c_str()
is always null terminated while data()
is not.
As far as I have seen in actual implementations, they either do the same or data()
calls c_str()
.
What am I missing here? Which one is more correct to use in which scenarios?
Even know you have seen that they do the same, or that .data() calls .c_str(), it is not correct to assume that this will be the case for other compilers. It is also possible that your compiler will change with a future release.
2 reasons to use std::string:
std::string can be used for both text and arbitrary binary data.
You should use the .c_str() method when you are using your string as example 1.
You should use the .data() method when you are using your string as example 2. Not because it is dangereous to use .c_str() in these cases, but because it is more explicit that you are working with binary data for others reviewing your code.
Possible pitfall with using .data()
The following code is wrong and could cause a segfault in your program:
Why is it common for implementers to make .data() and .c_str() do the same thing?
Because it is more efficient to do so. The only way to make .data() return something that is not null terminated, would be to have .c_str() or .data() copy their internal buffer, or to just use 2 buffers. Having a single null terminated buffer always means that you can always use just one internal buffer when implementing std::string.
Quote from
ANSI ISO IEC 14882 2003
(C++03 Standard):All the previous commments are consistence, but I'd also like to add that starting in c++17, str.data() returns a char* instead of const char*
In C++11/C++0x,
data()
andc_str()
is no longer different. And thusdata()
is required to have a null termination at the end as well.The documentation is correct. Use
c_str()
if you want a null terminated string.If the implementers happend to implement
data()
in terms ofc_str()
you don't have to worry, still usedata()
if you don't need the string to be null terminated, in some implementation it may turn out to perform better than c_str().strings don't necessarily have to be composed of character data, they could be composed with elements of any type. In those cases
data()
is more meaningful.c_str()
in my opinion is only really useful when the elements of your string are character based.Extra: In C++11 onwards, both functions are required to be the same. i.e.
data
is now required to be null-terminated. According to cppreference: "The returned array is null-terminated, that is, data() and c_str() perform the same function."It has been answered already, some notes on the purpose: Freedom of implementation.
std::string
operations - e.g. iteration, concatenation and element mutation - don't need the zero terminator. Unless you pass thestring
to a function expecting a zero terminated string, it can be omitted.This would allow an implementation to have substrings share the actual string data:
string::substr
could internally hold a reference to shared string data, and the start/end range, avoiding the copy (and additional allocation) of the actual string data. The implementation would defer the copy until you call c_str or modify any of the strings. No copy would ever be made if the strigns involved are just read.(copy-on-write implementation aren't much fun in multithreaded environments, plus the typical memory/allocation savings aren't worth the more complex code today, so it's rarely done).
Similarly,
string::data
allows a different internal representation, e.g. a rope (linked list of string segments). This can improve insert / replace operations significantly. again, the list of segments would have to be collapsed to a single segment when you callc_str
ordata
.