Regexp Greek chars by number

2020-04-19 04:17发布

问题:

I deal with strings that contain Greek and English (Latin) text. I'd like to use a regex to catch all the Greek words that contain 4 or more characters on them.

Using regexp manual I figure out that I can use \p{Greek} to grab all Greek words and \w{4,} in order to grab 4+ character words. However, these two don't work together, from various tests I made.

Is there any way to do what I want using 1 regexp expression? Strings are UTF-8 and come out of tweets.

Regards

回答1:

Are you using the UTF-8 pattern modifier?

/\p{Greek}{4,}/u


标签: ruby regex utf-8