Vector's new method data()
provides a const and non-const version.
However string's data()
method only provides a const version.
I think they changed the wording about std::string
so that the chars are now required to be contiguous (like std::vector
).
Was std::string::data
just missed? Or is the a good reason to only allow const access to a string's underlying characters?
note: std::vector::data
has another nice feature, it's not undefined behavior to call data()
on an empty vector. Whereas &vec.front()
is undefined behavior if it's empty.
In C++98/03 there was good reason to not have a non-const data()
due to the fact that string was often implemented as COW. A non-const data()
would have required a copy to be made if the refcount was greater than 1. While possible, this was not seen as desirable in C++98/03.
In Oct. 2005 the committee voted in LWG 464 which added the const and non-const data()
to vector
, and added const and non-const at()
to map
. At that time, string
had not been changed so as to outlaw COW. But later, by C++11, a COW string
is no longer conforming. The string
spec was also tightened up in C++11 such that it is required to be contiguous, and there's always a terminating null exposed by operator[](size())
. In C++03, the terminating null was only guaranteed by the const overload of operator[]
.
So in short a non-const data()
looks a lot more reasonable for a C++11 string
. To the best of my knowledge, it was never proposed.
Update
charT* data() noexcept;
was added basic_string
in the C++1z working draft N4582 by David Sankel's P0272R1 at the Jacksonville meeting in Feb. 2016.
Nice job David!
Historically, the string data has not been const because it would prevent several common optimizations, like copy-on-write (COW). This is now, IIANM, far less common, because it behaves badly with multithreaded programs.
BTW, yes they are now required to be contiguous:
[string.require].5: The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string
object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().
Another reason might be to avoid code such as:
std::string ret;
strcpy(ret.data(), "whatthe...");
Or any other function that returns a preallocated char array.
Although I'm not that well-versed in the standard, it might be due to the fact that std::string
doesn't need to contain null-terminated data, but it can and it doesn't need to contain an explicit length field, but it can. So changing the undelying data and e.g. adding a '\0'
in the middle might get the strings length field out of sync with the actual char data and thus leave the object in an invalid state.
@Christian Rau
From the time the original Plauger (around 1995 I think) string
class was STL-ized by the committee (turned into a Sequence, templatified), std::string
has always been std::vector
plus string-related stuff (conversion from/to 0-terminated, concatenation, ...), plus some oddities, like COW that's actually "Copy on Write and on non-const
begin()
/end()
/operator[]
".
But ultimately a std::string
is really a std::vector
under another name, with a slightly different focus and intent. So:
- just like
std::vector
, std::string
has either a size data member or both start and end data members;
- just like
std::vector
, std::string
does not care about the value of its elements, embedded NUL or others.
std::string
is not a C string with syntax sugar, utility functions and some encapsulation, just like std::vector<T>
is not T[]
with syntax sugar, utility functions and some encapsulation.