I have the following code that takes a string and erases non alphabet characters
void removeNonAlpha(string& str){
for (string::iterator it = str.begin(); it < str.end(); it++){
if (!(isUpperCaseLetter(*it) || isLowerCaseLetter(*it) || str == ' '))
str.erase(it--);
}
}
I showed this to my professor and he told me that doing this is risky because it may invalidate the iterator that I'm using. However, I thought that erase will only invalidate iterators after the point of the erase, and I made sure not to use any iterators after that point.
So could this code crash or cause any undefined behavior?
std::vector::erase
works as you suggest; it only invalidates iterators starting with the first erased element. However, that doesn't apply to std::string
.
C++ allows string iterators to be invalidated at the drop of a hat.
The C++ standard has traditionally been more flexible with the requirements for std::string
. (Or, in other words, it has traditionally allowed implementers to use optimizations which would not be valid for vectors.) And so it is with std::string::erase
, and other string mutators.
In [string.require]
(§21.4.1 of n3797), the standard accepts that:
- References, pointers, and iterators referring to the elements of a
basic_string
sequence may be invalidated by the following uses of that basic_string
object:
- as an argument to any standard library function taking a reference to non-const
basic_string
as an argument.
- Calling non-const member functions, except
operator[]
, at
, front
, back
, begin
, rbegin
, end
, and rend
.
In other words, calling a potentially mutating function like std::string::erase
could invalidate all iterators to that string, even if no visible modifications are made to the string (for example, because the range to be erased is empty).
(The latest draft C++ standard has the same wording, although it is now paragraph 4.)
The proposed code involves undefined behaviour if the first character of the string is not alphabetic.
In the first loop through the string, the iterator it
has the value str.begin()
. That iterator cannot be decremented, since the result would not be inside the string. And therefore, incrementing the decremented iterator might not return it
to str.begin()
for the next iteration.
Use indices rather than iterators
None of the above applies to integer position indices. So if you could safely replace your loop with the very similar:
void removeNonAlpha(string& str){
for (auto sz = str.size(), i = 0; i < sz; ++i){
if (!(isUpperCaseLetter(str[i]) ||
isLowerCaseLetter(str[i]) ||
str[i] == ' '))
str.erase(i--);
}
}
It won't compile. str
is std::string and str == ' '
are two different types.
Secondly, when iterating reversed you need to use the string::reverse_iterator, string::rbegin() and string::rend();