How to match with regex all special chars except “

2019-02-06 18:07发布

问题:

How can I match all the “special” chars (like +_*&^%$#@!~) except the char - in PHP?

I know that \W will match all the “special” chars including the -.

Any suggestions in consideration of Unicode letters?

回答1:

  • [^-] is not the special character you want
  • [\W] are all special characters as you know
  • [^\w] are all special characters as well - sounds fair?

So therefore [^\w-] is the combination of both: All "special" characters but without -.



回答2:

  • \pL matches any character with the Unicode Letter character property, which is a major general category group; that is, it matches [\p{Ll}\p{Lt}\p{Lu}\p{Lm}\p{Lo}].
  • \pN matches any character with the Unicode Number character property, which is a major general category group; that is, it matches [\p{Nd}\p{Nl}\p{No}].
  • Note that the Unicode Alphabetic characterproperty also includes certain combining marks such as U+0345 ◌ͅ ᴄᴏᴍʙɪɴɪɴɢ ɢʀᴇᴇᴋ ʏᴘᴏɢᴇɢʀᴀᴍᴍᴇɴɪ. I suggest you that you also include \pM, which matches any character with the Unicode Mark character property, which is a major general category group; that is, it matches [\p{Mn}\p{Me}\p{Mc}].
  • Character U+002D ʜʏᴘʜᴇɴ-ᴍɪɴᴜꜱ is probably the - you’re referring to.
  • Note though that Unicode v6.1 has 27 characters with the Unicode Dash character property, including such common characters as U+2010 ʜʏᴘʜᴇɴ, U+2013 ᴇɴ ᴅᴀꜱʜ, U+2014 ᴇᴍ ᴅᴀꜱʜ, and U+2212 ᴍɪɴᴜꜱ ꜱɪɢɴ. Whether you actually want to include or exclude those, I have no idea.

Given all that, it is not unlikely that you want something like:

[^\pL\pN\pM\x2D\x{2010}-\x{2015}\x{2212}]


回答3:

You can try this pattern

([^a-zA-Z-])

This should match all characters that are not a-z and the -