Regex modifier /u in JavaScript?

2020-06-01 03:15发布

问题:

Recently I have created a regex, for my PHP code which allows only the letters (including special characters plus spaces), but now I'm having a problem with converting it (?) into the JavaScript compatible regex, here it is: /^[\s\p{L}]+$/u, the problem is the /u modifier at the end of the regex pattern, as the JavaScript doesn't allow such flag.

How can I rewrite this, so it will work in the JavaScript as well?

Is there something to allow only Polish characters: Ł, Ą, Ś, Ć, ...

回答1:

The /u modifier is for unicode support. Support for it was added to JavaScript in ES2015.

Read http://stackoverflow.com/questions/280712/javascript-unicode to learn more information about unicode in regex with JavaScript.


Polish characters:

Ą \u0104
Ć \u0106
Ę \u0118
Ł \u0141
Ń \u0143
Ó \u00D3
Ś \u015A
Ź \u0179
Ż \u017B
ą \u0105
ć \u0107
ę \u0119
ł \u0142
ń \u0144
ó \u00F3
ś \u015B
ź \u017A
ż \u017C

All special Polish characters:

[\u0104\u0106\u0118\u0141\u0143\u00D3\u015A\u0179\u017B\u0105\u0107\u0119\u0142\u0144\u00F3\u015B\u017A\u017C]


回答2:

JavaScript doesn't have any notion of UTF-8 strings, so it's unlikely that you need the /u flag. (Your strings are probably already in the usual JavaScript form, one UTF-16 code-unit per "character".)

The bigger problem is that JavaScript doesn't support \p{L}, nor any equivalent notation; JavaScript regexes have no awareness of Unicode character properties. See the answers to this StackOverflow question for some ways to approximate it.


Edited to add: If you only need to support Polish letters, then you can write /^[\sa-zA-ZĄĆĘŁŃÓŚŹŻąćęłńóśźż]+$/. The a-z and A-Z parts cover the ASCII letters, and then the remaining letters are listed out individually.



回答3:

As of ES2015, /u is supported in JavaScript. See:

  • https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode
  • https://www.ecma-international.org/ecma-262/6.0/#sec-get-regexp.prototype.unicode