Is there a RegExp.escape function in Javascript?

2018-12-30 23:16发布

I just want to create a regular expression out of any possible string.

var usersString = "Hello?!*`~World()[]";
var expression = new RegExp(RegExp.escape(usersString))
var matches = "Hello".match(expression);

Is there a built in method for that? If not, what do people use? Ruby has RegExp.escape. I don't feel like I'd need to write my own, there's gotta be something standard out there. Thanks!

12条回答
只若初见
2楼-- · 2018-12-30 23:23

Most of the expressions here solve single specific use cases.

That's okay, but I prefer an "always works" approach.

function regExpEscape(literal_string) {
    return literal_string.replace(/[-[\]{}()*+!<=:?.\/\\^$|#\s,]/g, '\\$&');
}

This will "fully escape" a literal string for any of the following uses in regular expressions:

  • Insertion in a regular expression. E.g. new RegExp(regExpEscape(str))
  • Insertion in a character class. E.g. new RegExp('[' + regExpEscape(str) + ']')
  • Insertion in integer count specifier. E.g. new RegExp('x{1,' + regExpEscape(str) + '}')
  • Execution in non-JavaScript regular expression engines.

Special Characters Covered:

  • -: Creates a character range in a character class.
  • [ / ]: Starts / ends a character class.
  • { / }: Starts / ends a numeration specifier.
  • ( / ): Starts / ends a group.
  • * / + / ?: Specifies repetition type.
  • .: Matches any character.
  • \: Escapes characters, and starts entities.
  • ^: Specifies start of matching zone, and negates matching in a character class.
  • $: Specifies end of matching zone.
  • |: Specifies alternation.
  • #: Specifies comment in free spacing mode.
  • \s: Ignored in free spacing mode.
  • ,: Separates values in numeration specifier.
  • /: Starts or ends expression.
  • :: Completes special group types, and part of Perl-style character classes.
  • !: Negates zero-width group.
  • < / =: Part of zero-width group specifications.

Notes:

  • / is not strictly necessary in any flavor of regular expression. However, it protects in case someone (shudder) does eval("/" + pattern + "/");.
  • , ensures that if the string is meant to be an integer in the numerical specifier, it will properly cause a RegExp compiling error instead of silently compiling wrong.
  • #, and \s do not need to be escaped in JavaScript, but do in many other flavors. They are escaped here in case the regular expression will later be passed to another program.

If you also need to future-proof the regular expression against potential additions to the JavaScript regex engine capabilities, I recommend using the more paranoid:

function regExpEscapeFuture(literal_string) {
    return literal_string.replace(/[^A-Za-z0-9_]/g, '\\$&');
}

This function escapes every character except those explicitly guaranteed not be used for syntax in future regular expression flavors.


For the truly sanitation-keen, consider this edge case:

var s = '';
new RegExp('(choice1|choice2|' + regExpEscape(s) + ')');

This should compile fine in JavaScript, but will not in some other flavors. If intending to pass to another flavor, the null case of s === '' should be independently checked, like so:

var s = '';
new RegExp('(choice1|choice2' + (s ? '|' + regExpEscape(s) : '') + ')');
查看更多
旧人旧事旧时光
3楼-- · 2018-12-30 23:23

Nothing should prevent you from just escaping every non-alphanumeric character:

usersString.replace(/(?=\W)/g, '\\');

You lose a certain degree of readability when doing re.toString() but you win a great deal of simplicity (and security).

According to ECMA-262, on the one hand, regular expression "syntax characters" are always non-alphanumeric, such that the result is secure, and special escape sequences (\d, \w, \n) are always alphanumeric such that no false control escapes will be produced.

查看更多
伤终究还是伤i
4楼-- · 2018-12-30 23:23

This is a shorter version.

RegExp.escape = function(s) {
    return s.replace(/[$-\/?[-^{|}]/g, '\\$&');
}

This includes the non-meta characters of %, &, ', and ,, but the JavaScript RegExp specification allows this.

查看更多
素衣白纱
5楼-- · 2018-12-30 23:24

Mozilla Developer Network's Guide to Regular Expressions provides this escaping function:

function escapeRegExp(string) {
  return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); // $& means the whole matched string
}
查看更多
初与友歌
6楼-- · 2018-12-30 23:27

XRegExp has an escape function:

XRegExp.escape('Escaped? <.>'); // -> 'Escaped\?\ <\.>'

More on: http://xregexp.com/api/#escape

查看更多
浪荡孟婆
7楼-- · 2018-12-30 23:27

Rather than only escaping characters which will cause issues in your regular expression (e.g.: a blacklist), why not consider using a whitelist instead. This way each character is considered tainted unless it matches.

For this example, assume the following expression:

RegExp.escape('be || ! be');

This whitelists letters, number and spaces:

RegExp.escape = function (string) {
    return string.replace(/([^\w\d\s])/gi, '\\$1');
}

Returns:

"be \|\| \! be"

This may escape characters which do not need to be escaped, but this doesn't hinder your expression (maybe some minor time penalties - but it's worth it for safety).

查看更多
登录 后发表回答