regular expression to match name initials - PCRE

2020-04-19 05:49发布

站内文章 / PHP

76 0

够拽才男人

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a regular expression to get the initials of a name like below:

/\b\p{L}\./gu

it works fine with English and other languages until there are graphemes and combined charecters occur. Like
क in Hindi and
ಕ in Kannada are being matched
But,
के this one in Hindi,
ಕೆ this one in Kannada are notmatched with this regex.
I am trying to get the initials from a name like J.P.Morgan, etc.
Any help would be greatly appreciated.

回答1:

You need to match diacritic marks after base letters using \p{M}*:

'~\b(?<!\p{M})\p{L}\p{M}*\.~u'

The pattern matches

\b - a word boundary
(?<!\p{M}) - the char before the current position must not be a diacritic char (without it, a match can occur within a single word)
\p{L} - any base Unicode letter
\p{M}* - 0+ diacritic marks
\. - a dot.

See the PHP demo online:

$s = "क. ಕ. के. ಕೆ. ";
echo preg_replace('~\b(?<!\p{M})\p{L}\p{M}*+\.~u', '<pre>$0</pre>', $s); 
// => <pre>क.</pre> <pre>ಕ.</pre> <pre>के.</pre> <pre>ಕೆ.</pre>