C++ regex escaping punctional characters like “.”

2019-06-06 10:34发布

问题:

Matching a "." in a string with the std::tr1::regex class makes me use a weird workaround.

Why do I need to check for "\\\\." instead of "\\."?

regex(".") // Matches everything (but "\n") as expected.
regex("\\.") // Matches everything (but "\n").
regex("\\\\.") // Matches only ".".

Can someone explain me why? It's really bothering me since I had my code written using boost::regex classes, which didn't need this syntax.

Edit: Sorry, regex("\\\\.") seems to match nothing.

Edit2: Some code

void parser::lex(regex& token)
{
    // Skipping whitespaces
    {
        regex ws("\\s*");
        sregex_token_iterator wit(source.begin() + pos, source.end(), ws, regex_constants::match_default), wend;
        if(wit != wend)
            pos += (*wit).length();
    }

    sregex_token_iterator it(source.begin() + pos, source.end(), token, regex_constants::match_default), end;
    if (it != end)
        temp = *it;
    else
        temp = "";
}

回答1:

As it turns out, the actual problem was due to the way sregex_token_iterator was used. Using match_default meant it was always finding the next match in the string, if any, even if there is a non-match in-between. That is,

string source = "AAA.BBB";
regex dot("\\.");
sregex_token_iterator wit(source.begin(), source.end(), dot, regex_constants::match_default);

would give a match at the dot, rather than reporting that there was no match.

The solution is to use match_continuous instead.



回答2:

This is because \. is interpreted as an escape sequence, which the language itself is trying to interpret as a single character. What you want is for your regex to contain the actual string "\.", which is written \\. because \\ is the escape sequence for the backslash character (\).



回答3:

Try to escape the dot by its ASCII code:

regex("\\x2E")