Does g++ meets std::string C++11 requirements

2020-02-01 02:30发布

问题:

Consider the following example:

int main()
{
    string x = "hello";
    //copy constructor has been called here.
    string y(x);
    //c_str return const char*, but this usage is quite popular.
    char* temp = (char*)y.c_str();

    temp[0] = 'p';

    cout << "x = " << x << endl;
    cout << "y = " << y << endl;

    cin >> x;
    return 0;
}

Run it on visual studio compiler and on g++. When I did so, I got two different results.
in g++:

x = pello  
y = pello

In visual studio 2010:

x = hello  
y = pello

The reason for the diff is most likely that g++ std::string implementation uses COW (copy on write) techniques and visual studio does not.

Now the C++ standard (page 616 table 64) states with regards to string copy constructor

basic_string(const basic_string& str):

effects:
data() should "points at the first element of an allocated copy of the array whose first element is pointed at by str.data()"

Meaning COW is not allowed (at least to my understanding).
How can that be?
Does g++ meets std::string C++11 requirements?

Before C++11 this did not pose a big problem since c_str didn't return a pointer to the actual data the string object holds, so changing it didn't matter. But after the change this combination of COW + returning the actual pointer can and breaks old applications (applications that deserve it for bad coding but nevertheless).

Do you agree with me? If yes, can something be done? Does anyone have an idea about how to go at it in a very big old code environments (a clockwork rule to catch this would be nice).

Note that even without casting the constness away, one might cause invalidation of a pointer by calling c_str, saving the pointer and then calling non-const method (which will cause write).
Another example without casting the constness away:

int main()
{
    string x = "hello";
    //copy constructor has been called here.
    string y(x);

    //y[0] = 'p';

    //c_str return const char*, but this usage is quite popular.
    const char* temp = y.c_str();

    y[0] = 'p';

    //Now we expect "pello" because the standart says the pointer points to the actual data
    //but we will get "hello"
    cout << "temp = " << temp << endl; 



    return 0;
}

回答1:

You're right that COW is disallowed. But GCC hasn't updated its implementation yet, allegedly due to ABI constraints. A new implementation, designed eventually to supplant the std::string implementation, can be found as ext/vstring.h.

A bug in libstdc++'s std::string, albeit not this one, is not going to make it into GCC 4.9; Jonathan indicates on the bug that it has only been fixed for vstring so far. My guess would be, then, that the COW issue would be resolved around the same time.

Despite all this, casting away constness then mutating is pretty much always a bad idea: though you're correct that this should in practice be safe with a fully C++11-compliant string implementation, you're making assumptions and this very problem proves that you cannot always rely on those assumptions to hold. So, while your code example may be "popular", it's popular in poor code, and shouldn't be written even now. And, of course, writing that in C++03 is flat-out incompetence!



回答2:

libstd++'s implementation is non-conformant to C++11, but that doesn't mean your code is correctly guaranteeing the results you expect.

Doing anything to modify the values stored in the character array returned by c_str() results in undefined behavior. The standard explicitly says this:

21.4.7.1 basic_string accessors

const charT* c_str() const noexcept;
const charT* data() const noexcept;
1 Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
2 Complexity: constant time.
3 Requires: The program shall not alter any of the values stored in the character array.

Although above I quote C++11 this was also true of C++03.


Does anyone have an idea about how to go at it in a very big old code environments (a clockwork rule to catch this would be nice).

Hopefully you have a decent test suite. Making significant changes to large, legacy code-bases is not really practical otherwise. The easier and faster it is to run the test suite the easier and faster it will be to fix the code.

On a very large codebase auditing all uses of c_str() may be very expensive. However taking a sample and checking for what sorts of uses are made of it and what specific corrections could be applied can help you gauge the scale of the problem. In my experience you can expect a wide variety of weird things, but some will be more common.

Valgrind, debug implementations of std::string, and other tools can help identify some instances which are likely to cause real bugs. Fixing those first is the high priority. The fixes will likely involve updating APIs to be const-correct or to have well defined lifetime requirements, and switching uses of c_str() for something that produces C strings with appropriate lifetimes. Your survey of the code should have informed you as to the general variety of lifetime requirements and c-string creating utilities that will be necessary.

Other uses of c_str() can be modified incrementally over time as a lower priority, side activity.

Tools like some of those built on top of clang for refactoring or semantic search are another option for identifying problems and making large-scale changes, however it's often a big task just to get legacy code into a legal enough shape for clang tools to process it. (Here's a talk about some work Google did on this. There are also more recent talks they've done on commodity versions of this technology which Google has made available.)


I often have a hard time convincing people that 'undefined behavior' is actually a problem even in instances when no ill effects are actually observed. As you write new code remember from this experience that the lives of future maintainers will be made much easier if you conform to the C++ spec. Even if some particular instance of 'bad' code doesn't cause you problems now, that is likely to change over time as compilers and library implementations change. And even when the spec changes, the committee is careful to consider the effects on conformant legacy code. If code isn't conformant then it really doesn't get any consideration and you end up with problems like this.



回答3:

Does g++ meets std::string C++11 requirements?

No.

Before C++11 this did not pose a big problem since c_str didn't return a pointer to the actual data the string object holds, so changing it didn't matter.

This is incorrect, c_str was always allowed to return the actual data and that's exactly what it did in all popular C++03 implementations.

But after the change this combination of COW + returning the actual pointer can and breaks old applications (applications that deserve it for bad coding but nevertheless).

After what change? G++ did not change its std::string so if your old program is broken using G++ then it was always broken.

Note that even without casting the constness away, one might cause invalidation of a pointer by calling c_str, saving the pointer and then calling non-const method (which will cause write).

Your second example doesn't demonstrate any invalidation, because in a COW implementation temp is still a valid pointer while x exists. But it's possible to modify the example to invalidate temp and that's not allowed in C++11, [string.require]/6 says that in C++11 y[0] is not allowed to invalidate the pointer returned by c_str().



回答4:

The other answers were correct at the time, but as of nowadays, accordingly to the GCC 5.x Change Log, libstdc++ as shipped by gcc 5 is now fully C++11 conformant.