I have seen many times that std::string::operator[]
does not do any bounds checking. Even What is the difference between string::at and string::operator[]?, asked in 2013, the answers say that operator[]
does not do any bounds checking.
My issue with this is if I look at the standard (in this case draft N3797) in [string.access] we have
const_reference operator[](size_type pos) const;
reference operator[](size_type pos);
- Requires:
pos <= size()
.
- Returns:
*(begin() + pos)
if pos < size()
. Otherwise, returns a reference to an object of type charT
with value charT()
, where modifying the object leads to undefined behavior.
- Throws: Nothing.
- Complexity: constant time.
This leads me to believe that operator[]
has to do some sort of bounds checking to determine if it needs to return a element of the string or a default charT
. Is this assumption correct and operator[]
is now required to do bounds checking?
The wording is slightly confusing, but if you study it in detail you'll find that it's actually very precise.
It says this:
- The precondition is that the argument to
[]
is either = n or it's < n.
- Assuming that precondition is satisfied:
- If it's < n then you get the character you asked for.
- "Otherwise" (i.e. if it's n) then you get
charT()
(i.e. the null character).
But no rule is defined for when you break the precondition, and the check for = n can be satisfied implicitly (but isn't explicitly mandated to be) by actually storing a charT()
at position n.
So implementations don't need to perform any bounds checking… and the common ones won't.
operator[]
has do some sort of bounds checking to determine...
No it doesn't. With the precondition
Requires: pos <= size().
it can just ASSUME that it can always return an element of the string. If this condition isn't met: Undefined behaviour.
The operator[]
will likely just increment the pointer from the start of the string by pos. If the string is shorter, well then it just returns a reference to the data behind the string, whatever it might be. Like a classic out of bounds in simple C arrays.
To fullify the case of where pos == size()
it could just have allocated an extra charT
at the end of its internal string data. So just incrementing the pointer without any checks, would still deliver the stated behaviour.
First, there is a requires clause. If you violate the requires clause, your program behaves in an undefined manner. That is pos <= size()
.
So the language only defines what happens in that case.
The next paragraph states that for pos < size()
, it returns a reference to an element in the string. And for pos == size()
, it returns a reference to a default constructed charT
with value charT()
.
While this may look like bounds checking, in practice what actually happens is that the std::basic_string
allocates a buffer one larger than asked and populates the last entry with a charT()
. Then []
simply does pointer arithemetic.
I have tried to come up with a way to avoid that implementation. While the standard does not mandate it, I could not convince myself an alternative exists. There was something annoying with .data()
that made it difficult to avoid the single buffer.
This operator of standard containers emulates the behavior of the operator [] of ordinary arrays. So it does not make any checks. However in the debug mode the corresponding library can provide this checking.
If you want to check the index then use member function at()
instead.
http://en.cppreference.com/w/cpp/string/basic_string/operator_at
Returns a reference to the character at specified location pos. No
bounds checking is performed.
(Emphasis mine).
If you want bounds checking, use std::basic_string::at
The standard imply the implementation needs to provide bounds checking because it basically describes what an unchecked array access does.
If you access within bounds, it's defined. If you step outside, you trigger undefined behavior.