It had been my understanding that copy-on-write is not a viable way to implement a conforming std::string
in C++11, but when it came up in discussion recently I found myself unable to directly support that statement.
Am I correct that C++11 does not admit COW based implementations of std::string
?
If so, is this restriction explicitly stated somewhere in the new standard (where)?
Or is this restriction implied, in the sense that it is the combined effect of the new requirements on std::string
that precludes a COW based implementation of std::string
. In this case, I'd be interested in a chapter and verse style derivation of 'C++11 effectively prohibits COW based std::string
implementations'.
From 21.4.2 basic_string constructors and assignment operators [string.cons]
Table 64 helpfully documents that after construction of an object via this (copy) constructor,
this->data()
has as value:There are similar requirements for other similar constructors.
Since it is now guaranteed that strings are stored contiguously and you are now allowed to take a pointer to the internal storage of a string, (i.e. &str[0] works like it would for an array), it's not possible to make a useful COW implementation. You would have to make a copy for way too many things. Even just using
operator[]
orbegin()
on a non-const string would require a copy.Is COW
basic_string
prohibited in C++11 and later?Regarding
Yes.
Regarding
Almost directly, by requirements of constant complexity for a number of operations that would require O(n) physical copying of the string data in a COW implementation.
For example, for the member functions
… which in a COW implementation would ¹both trigger string data copying to un-share the string value, the C++11 standard requires
C++11 §21.4.5/4:… which rules out such data copying, and hence, COW.
C++03 supported COW implementations by not having these constant complexity requirements, and by, under certain restrictive conditions, allowing calls to
operator[]()
,at()
,begin()
,rbegin()
,end()
, orrend()
to invalidate references, pointers and iterators referring to the string items, i.e. to possibly incur a COW data copying. This support was removed in C++11.Is COW also prohibited via the C++11 invalidation rules?
In another answer which at the time of writing is selected as solution, and which is heavily upvoted and therefore apparently believed, it's asserted that
That assertion is incorrect and misleading in two main ways:
const
item accessors need to trigger a COW data copying.But also the
const
item accessors need to trigger data copying, because they allow client code to form references or pointers that (in C++11) it's not permitted to invalidate later via the operations that can trigger COW data copying.But in a correct implementation COW data copying, un-sharing the string value, is done at a point before there are any references that can be invalidated.
To see how a correct C++11 COW implementation of
basic_string
would work, when the O(1) requirements that make this invalid are ignored, think of an implementation where a string can switch between ownership policies. A string instance starts out with policy Sharable. With this policy active there can be no external item references. The instance can transition to Unique policy, and it must do so when an item reference is potentially created such as with a call to.c_str()
(at least if that produces a pointer to the internal buffer). In the general case of multiple instances sharing ownership of the value, this entails copying the string data. After that transition to Unique policy the instance can only transition back to Sharable by an operation that invalidates all references, such as assignment.So, while that answer's conclusion, that COW strings are ruled out, is correct, the reasoning offered is incorrect and strongly misleading.
I suspect the cause of this misunderstanding is a non-normative note in C++11's annex C:
C++11 §C.2.11 [diff.cpp03.strings], about §21.3:Here the rationale explains the primary why one decided to remove the C++03 special COW support. This rationale, the why, is not how the standard effectively disallows COW implementation. The standard disallows COW via the O(1) requirements.
In short, the C++11 invalidation rules don't rule out a COW implementation of
C++03 §21.3/5 which includes “first call” COW support:std::basic_string
. But they do rule out a reasonably efficient unrestricted C++03-style COW implementation like the one in at least one of g++'s standard library implementations. The special C++03 COW support allowed practical efficiency, in particular usingconst
item accessors, at the cost of subtle, complex rules for invalidation:These rules are so complex and subtle that I doubt many programmers, if any, could give a precise summary. I could not.
What if O(1) requirements are disregarded?
If the C++11 constant time requirements on e.g.
operator[]
are disregarded, then COW forbasic_string
could be technically feasible, but difficult to implement.Operations which could access the contents of a string without incurring COW data copying include:
+
.<<
.basic_string
as argument to standard library functions.The latter because the standard library is permitted to rely on implementation specific knowledge and constructs.
Additionally an implementation could offer various non-standard functions for accessing string contents without triggering COW data copying.
A main complicating factor is that in C++11
basic_string
item access must trigger data copying (un-sharing the string data) but is required to not throw, e.g. C++11 §21.4.5/3 “Throws: Nothing.”. And so it can't use ordinary dynamic allocation to create a new buffer for COW data copying. One way around this is to use a special heap where memory can be reserved without being actually allocated, and then reserve the requisite amount for each logical reference to a string value. Reserving and un-reserving in such a heap can be constant time, O(1), and allocating the amount that one has already reserved, can benoexcept
. In order to comply with the standard's requirements, with this approach it seems there would need to be one such special reservation-based heap per distinct allocator.Notes:
¹ The
const
item accessor triggers a COW data copying because it allows the client code to obtain a reference or pointer to the data, which it's not permitted to invalidate by a later data copying triggered by e.g. the non-const
item accessor.It's not allowed, because as per the standard 21.4.1 p6, invalidation of iterators/references is only allowed for
For a COW string, calling non-const
operator[]
would require making a copy (and invalidating references), which is disallowed by the paragraph above. Hence, it's no longer legal to have a COW string in C++11.The answers by Dave S and gbjbaanb are correct. (And Luc Danton's is correct too, although it's more a side-effect of forbidding COW strings rather than the original rule that forbids it.)
But to clear up some confusion, I'm going to add some further exposition. Various comments link to a comment of mine on the GCC bugzilla which gives the following example:
The point of that example is to demonstrate why GCC's reference counted (COW) string is not valid in C++11. The C++11 standard requires this code to work correctly. Nothing in the code permits the
p
to be invalidated in C++11.Using GCC's old reference-counted
std::string
implementation, that code has undefined behaviour, becausep
is invalidated, becoming a dangling pointer. (What happens is that whens2
is constructed it shares the data withs
, but obtaining a non-const reference vias[0]
requires the data to be unshared, sos
does a "copy on write" because the references[0]
could potentially be used to write intos
, thens2
goes out of scope, destroying the array pointed to byp
).The C++03 standard explicitly permits that behaviour in 21.3 [lib.basic.string] p5 where it says that subsequent to a call to
data()
the first call tooperator[]()
may invalidate pointers, references and iterators. So GCC's COW string was a valid C++03 implementation.The C++11 standard no longer permits that behaviour, because no call to
operator[]()
may invalidate pointers, references or iterators, irrespective of whether they follow a call todata()
.So the example above must work in C++11, but does not work with libstdc++'s kind of COW string, therefore that kind of COW string is not permitted in C++11.
It is, CoW is an acceptable mechanism for making faster strings... but...
it makes multithreading code slower (all that locking to check if you're the only one writing kills performance when using a lot of strings). This was the main reason CoW was killed off years ago.
The other reasons are that the
[]
operator will return you the string data, without any protection for you to overwrite a string someone else expects to be unchanging. The same applies toc_str()
anddata()
.Quick google says that the multithreading is basically the reason it was effectively disallowed (not explicitly).
The proposal says :
followed by
Ropes are part of STLPort and SGIs STL.