c++ mac os x regex(“.*”) causes infinite loop with

2019-01-20 17:54发布

问题:

This causes infinite loop:

std::regex_replace("the string", std::regex(".*"), "whatevs");

This DOES NOT cause infinite loop:

std::regex_replace("the string", std::regex("^.*$"), "whatevs");

What is wrong with Mac regex implementation? using Mac OS X El Capitan Xcode 7.1

this question is related to: C++ Mac OS infinite loop in regex_replace if given blank regex expression

回答1:

The .* matches the whole string first, and then the empty string at the end because * means "match 0 or more occurrences of the preceding subpattern". The empty string match is probably the cause of the infinite loop, but I'm not sure whether it's a bug or by-design.

You can override the behavior using std::regex_constants::match_not_null (see regex_replace c++ reference):

match_not_null  Not null  Empty sequences do not match.

C++ code demo returning whatevs only:

std::regex reg(".*");
std::string s = "the string";
std::cout << std::regex_replace(s, reg, "whatevs",    
         std::regex_constants::match_not_null) << std::endl;

Note that the "infinite loop" you observe is most likely a bug since the source code hints that an exception should be thrown once an empty string is passed to the regex engine. It is not yet logged anywhere. I think (not sure) the issue might be with how the string is handled by the regex_replace method when matches are collected for a replace operation.

Here is what happens: The regex_replace calls

basic_string<_Elem, _Traits1, _Alloc1> regex_replace(const basic_string<_Elem, _Traits1, _Alloc1>& _Str, const basic_regex<_Elem, _RxTraits>& _Re, const _Elem *_Ptr, regex_constants::match_flag_type _Flgs = regex_constants::match_default)
{   // search and replace, string result, string target, NTBS format
    basic_string<_Elem, _Traits1, _Alloc1> _Res;
    const basic_string<_Elem> _Fmt(_Ptr);
    regex_replace(_STD back_inserter(_Res), _Str.begin(), _Str.end(),
        _Re, _Fmt, _Flgs);
    return (_Res);
}

_Res is an empty string, _Fmt is now whatevs. Then, the regex_replace is called. _Str.end() equals 10, and a pointer is initialized.

_First equals the string and _Last equals an empty string.

It happens as a result of internal char buffer processing whose pointer actually contains an array of:

The inline back_insert_iterator<_Container> back_inserter(_Container& _Cont) method first creates a string out of the first 0 to 9 chars, and then from 10 to 15 array elements (the one starting with the null terminator).



回答2:

stribizhev's answer inspired this one. Here are example results using various flags:


GOOD

boost::regex_replace(input, match, replace, input.empty() ? boost::regex_constants::match_default : boost::regex_constants::match_not_null);

results:

input:   ""
match:   ".*"
replace: "a"
output:  "a"

input:   "something"
match:   ".*"
replace: "a"
output:  "a"

BAD

boost::regex_replace(input, match, replace, boost::regex_constants::match_not_null);

results:

input:   ""
match:   ".*"
replace: "a"
output:  ""

input:   "something"
match:   ".*"
replace: "a"
output:  "a"

BAD

boost::regex_replace(input, match, replace);

results:

input:   ""
match:   ".*"
replace: "a"
output:  "a"

input:   "something"
match:   ".*"
replace: "a"
output:  "aa"