List of all characters that should be escaped befo

2019-01-06 12:56发布

问题:

Could someone please give a complete list of special characters that should be escaped?

I fear I don't know some of them.

回答1:

Take a look at PHP.JS's implementation of PHP's preg_quote function, that should do what you need:

http://phpjs.org/functions/preg_quote:491

The special regular expression characters are: . \ + * ? [ ^ ] $ ( ) { } = ! < > | : -



回答2:

According to this site, the list of characters to escape is

[, the backslash \, the caret ^, the dollar sign $, the period or dot ., the vertical bar or pipe symbol |, the question mark ?, the asterisk or star *, the plus sign +, the opening round bracket ( and the closing round bracket ).

In addition to that, you need to escape characters that are interpreted by the Javascript interpreter as end of the string, that is either ' or ".



回答3:

The hyphen (-) needs to be escaped when it's within square brackets and it's not positioned at the start or the end e.g. Need to escape - for

[a-z0-9\-_]+

No need to escape - for

[a-z0-9_-]+


回答4:

Based off of Tatu Ulmanen's answer, my solution in C# took this form:

private static List<string> RegexSpecialCharacters = new List<string>
{
    "\\",
    ".",
    "+",
    "*",
    "?",
    "[",
    "^",
    "]",
    "$",
    "(",
    ")",
    "{",
    "}",
    "=",
    "!",
    "<",
    ">",
    "|",
    ":",
    "-"
};


foreach (var rgxSpecialChar in RegexSpecialCharacters)
                rgxPattern = input.Replace(rgxSpecialChar, "\\" + rgxSpecialChar);

Note that I have switched the positions of '\' and '.', failure to process the slashes first will lead to doubling up of the '\'s



回答5:

I was looking for this list in regards to ESLint's "no-useless-escape" setting for reg-ex. And found some of these characters mentioned do not need to be escaped for a regular-expression in JS. The longer list in the other answer here is for PHP, which does require the additional characters to be escaped.

In this github issue for ESLint, about halfway down, user not-an-aardvark explains why the character referenced in the issue is a character that should maybe be escaped.

In javascript, a character that NEEDS to be escaped is a syntax character, or one of these:

^ $ \ . * + ? ( ) [ ] { } |

The response to the github issue I linked to above includes explanation about "Annex B" semantics (which I don't know much about) which allows 4 of the above mentioned characters to be UNescaped: ) ] { }.

Another thing to note is that escaping a character that doesn't require escaping won't do any harm (except maybe if you're trying to escape the escape character). So, my personal rule of thumb is: "When in doubt, escape"



回答6:

The problem:

const character = '+'
new RegExp(character, 'gi') // error

Smart solutions:

// ES2016
const character = '+'
const escapeCharacter = RegExp.escape(character)
new RegExp(escapeCharacter, 'gi') // /\+/gi

// ES5
const character = '+'
const escapeCharacter = escapeRegExp(character)
new RegExp(escapeCharacter, 'gi') // /\+/gi

function escapeRegExp(string){
    return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')
}