Why are std::vector::data and std::string::data di

2019-01-11 17:03发布

问题:

Vector's new method data() provides a const and non-const version.
However string's data() method only provides a const version.

I think they changed the wording about std::string so that the chars are now required to be contiguous (like std::vector).

Was std::string::data just missed? Or is the a good reason to only allow const access to a string's underlying characters?

note: std::vector::data has another nice feature, it's not undefined behavior to call data() on an empty vector. Whereas &vec.front() is undefined behavior if it's empty.

回答1:

In C++98/03 there was good reason to not have a non-const data() due to the fact that string was often implemented as COW. A non-const data() would have required a copy to be made if the refcount was greater than 1. While possible, this was not seen as desirable in C++98/03.

In Oct. 2005 the committee voted in LWG 464 which added the const and non-const data() to vector, and added const and non-const at() to map. At that time, string had not been changed so as to outlaw COW. But later, by C++11, a COW string is no longer conforming. The string spec was also tightened up in C++11 such that it is required to be contiguous, and there's always a terminating null exposed by operator[](size()). In C++03, the terminating null was only guaranteed by the const overload of operator[].

So in short a non-const data() looks a lot more reasonable for a C++11 string. To the best of my knowledge, it was never proposed.

Update

charT* data() noexcept;

was added basic_string in the C++1z working draft N4582 by David Sankel's P0272R1 at the Jacksonville meeting in Feb. 2016.

Nice job David!



回答2:

Historically, the string data has not been const because it would prevent several common optimizations, like copy-on-write (COW). This is now, IIANM, far less common, because it behaves badly with multithreaded programs.

BTW, yes they are now required to be contiguous:

[string.require].5: The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().

Another reason might be to avoid code such as:

std::string ret;
strcpy(ret.data(), "whatthe...");

Or any other function that returns a preallocated char array.



回答3:

Although I'm not that well-versed in the standard, it might be due to the fact that std::string doesn't need to contain null-terminated data, but it can and it doesn't need to contain an explicit length field, but it can. So changing the undelying data and e.g. adding a '\0' in the middle might get the strings length field out of sync with the actual char data and thus leave the object in an invalid state.



回答4:

@Christian Rau

From the time the original Plauger (around 1995 I think) string class was STL-ized by the committee (turned into a Sequence, templatified), std::string has always been std::vector plus string-related stuff (conversion from/to 0-terminated, concatenation, ...), plus some oddities, like COW that's actually "Copy on Write and on non-const begin()/end()/operator[]".

But ultimately a std::string is really a std::vector under another name, with a slightly different focus and intent. So:

  • just like std::vector, std::string has either a size data member or both start and end data members;
  • just like std::vector, std::string does not care about the value of its elements, embedded NUL or others.

std::string is not a C string with syntax sugar, utility functions and some encapsulation, just like std::vector<T> is not T[] with syntax sugar, utility functions and some encapsulation.