Regex to find words that start with a specific cha

2020-02-05 06:07发布

I am trying to find words starts with a specific character like:

Lorem ipsum #text Second lorem ipsum. How #are You. It's ok. Done. Something #else now.

I need to get all words starts with "#". so my expected results are #text, #are, #else

Any ideas?

标签: c# regex
5条回答
冷血范
2楼-- · 2020-02-05 06:39

Match a word starting with # after a white space or the beginning of a line. The last word boundary in not necessary depending on your usage.

/(?:^|\s)\#(\w+)\b/

The parentheses will capture your word in a group. Now, it depends on the language how you apply this regex.

The (?:...) is a non-capturing group.

查看更多
家丑人穷心不美
3楼-- · 2020-02-05 06:41

Try this #(\S+)\s?

查看更多
ゆ 、 Hurt°
4楼-- · 2020-02-05 06:48

Search for:

  • something that is not a word character then
  • #
  • some word characters

So try this:

/(?<!\w)#\w+/

Or in C# it would look like this:

string s = "Lorem ipsum #text Second lorem ipsum. How #are You. It's ok. Done. Something #else now.";
foreach (Match match in Regex.Matches(s, @"(?<!\w)#\w+"))
{
    Console.WriteLine(match.Value);
}

Output:

#text
#are
#else
查看更多
该账号已被封号
5楼-- · 2020-02-05 06:48

Code below should solve the case.

  • /\$(\w)+/g Searches for words that starts with $
  • /#(\w)+/g Searches for words that starts with #

The answer /(?<!\w)#\w+/ given by Mark Bayers throws a warning like below on RegExr.com website

"(?<!" The "negative lookbehind" feature may not be supported in all browsers.

the warning can be fixed by changing it to (?!\w)@\w+ by removing >

查看更多
一纸荒年 Trace。
6楼-- · 2020-02-05 07:04

To accommodate different languages I have this (PCRE/PHP):

'~(?<!\p{Latin})#(\p{Latin}+)~u'

or

$language = 'ex. get form value';
'~(?<!\p{' . $language . '})#(\p{' . $language . '}+)~u'

or to cycle through multiple scripts

$languages = $languageArray;

$replacePattern = [];

foreach ($languages as $language) {

  $replacePattern[] = '~(?<!\p{' . $language . '})#(\p{' . $language . '}+)~u';

}

$replacement = '<html>$1</html>';

$replaceText = preg_replace($replacePattern, $replacement, $text);

\w works great, but as far as I've seen is only for Latin script.

Switch Latin for Cyrillic or Phoenician in the above example.

The above example does not work for 'RTL' scripts.

查看更多
登录 后发表回答