Does using the erase function in a string invalida

2019-08-06 12:42发布

问题:

I have the following code that takes a string and erases non alphabet characters

void removeNonAlpha(string& str){
    for (string::iterator it = str.begin(); it < str.end(); it++){
        if (!(isUpperCaseLetter(*it) || isLowerCaseLetter(*it) || str == ' '))  
            str.erase(it--);

    }
}

I showed this to my professor and he told me that doing this is risky because it may invalidate the iterator that I'm using. However, I thought that erase will only invalidate iterators after the point of the erase, and I made sure not to use any iterators after that point. So could this code crash or cause any undefined behavior?

回答1:

std::vector::erase works as you suggest; it only invalidates iterators starting with the first erased element. However, that doesn't apply to std::string.

C++ allows string iterators to be invalidated at the drop of a hat.

The C++ standard has traditionally been more flexible with the requirements for std::string. (Or, in other words, it has traditionally allowed implementers to use optimizations which would not be valid for vectors.) And so it is with std::string::erase, and other string mutators.

In [string.require] (§21.4.1 of n3797), the standard accepts that:

  1. References, pointers, and iterators referring to the elements of a basic_string sequence may be invalidated by the following uses of that basic_string object:
    • as an argument to any standard library function taking a reference to non-const basic_string as an argument.
    • Calling non-const member functions, except operator[], at, front, back, begin, rbegin, end, and rend.

In other words, calling a potentially mutating function like std::string::erase could invalidate all iterators to that string, even if no visible modifications are made to the string (for example, because the range to be erased is empty).

(The latest draft C++ standard has the same wording, although it is now paragraph 4.)

The proposed code involves undefined behaviour if the first character of the string is not alphabetic.

In the first loop through the string, the iterator it has the value str.begin(). That iterator cannot be decremented, since the result would not be inside the string. And therefore, incrementing the decremented iterator might not return it to str.begin() for the next iteration.

Use indices rather than iterators

None of the above applies to integer position indices. So if you could safely replace your loop with the very similar:

void removeNonAlpha(string& str){
    for (auto sz = str.size(), i = 0; i < sz; ++i){
        if (!(isUpperCaseLetter(str[i]) ||
              isLowerCaseLetter(str[i]) ||
              str[i] == ' '))  
            str.erase(i--);
    }
}


回答2:

It won't compile. str is std::string and str == ' ' are two different types.

Secondly, when iterating reversed you need to use the string::reverse_iterator, string::rbegin() and string::rend();