JavaScript regular expression for word boundaries,

2019-08-27 07:54发布

I'm looking for a Regular Expression for JavaScript that will identify word boundaries in English, while accepting hyphens and apostrophes that appear inside words, but excluding those that appear alone or at the beginning or end of a word.

For example, for the sentence ...
  She said - 'That'll be all, Two-Fry.'
... I want the characters shown in grey below to be detected:
  Shesaid- 'That'llbeall,Two-Fry.'

If I use the regex /[^A-Za-z'-]/g, then "loose" hyphens and apostrophes are not detected.
  Shesaid-'That'llbeall,Two-Fry.'

How can I alter my regex so that it detects apostrophes and hyphens that don't have a word character on both sides?

You can test my regex here: https://regex101.com/r/bR8sV1/2

Note: the text I will be working on may contain other writing scripts, like руский and ไทอ so it will not be feasible to simply include all the characters that are not part of any English word.

2条回答
ら.Afraid
2楼-- · 2019-08-27 08:39

You can organize your word-boundary characters into two groups.

  1. Characters that cannot be alone.
  2. Characters that can be alone.

A regex that works with your example would be:

[\s.,'-]{2,}|[\s.]

Regex101 Demo

Now all that's left is to keep adding all non-word characters into those two groups until it fits all of your needs. So you might start adding symbols and more punctuation to those character classes.

查看更多
兄弟一词,经得起流年.
3楼-- · 2019-08-27 08:44

You could write something like that:

(\s|[!-/]|[:-@]|[\[-`]|[\{-~])*\s(\s|[!-/]|[:-@]|[\[-`]|[\{-~])*

Or the compact version:

(\s|[!-/:-@\[-`\{-~])*\s(\s|[!-/:-@\[-`\{-~])*

The RegExp requires one \s (Space character) and selects als spaces and non alphanumeric chars before and after it.

https://regex101.com/r/bR8sV1/4

  • \s matches all spaces
  • !-/ every char from ! to /
  • :-@ every char from : to @
  • \[-`` every char from [ to ``
  • \{-~ every char from { to ~
查看更多
登录 后发表回答