可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

thanks for the answers to :

"regular expression to detect numbers written as words" :

regular expression to detect numbers written as words

I now have this working, however I have the same requirement but the numbers as words are in Arabic (or any other UTF-8) and not English, so :

if (preg_match("/\p{L}\b(?:(?:واحد|اثنان|ثلاثة|أربعة|خمسة|ستة|سبعة|ثمانية|تسعة|صفر|عشرة)\b\s*?){4}/", $str, $matches) > 0) 
   return true;

Does not work - I've googled and there seems to be quite a few issues with preg_match and UTF-8 string but I couldn't get any of the suggestions found to work. Any help much appreciated.

回答1:

Note that \b may not be working as you expect. \b specifies a word boundary, but what is considered a word character by PCRE depends on what locale the script is running in (take a look towards the bottom of the PCRE escape sequences manual page):

A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w.

You might also want to read Handling UTF-8 with PHP (the section on PCRE in particular).

Instead, you could use a lookaround in conjunction with a Unicode character property to emulate a word boundary: (?<=\P{L}). This asserts that the previous character is not a unicode "letter".

So all together it would look like:

/(?<=\P{L})(?:(?:واحد|اثنان|ثلاثة|أربعة|خمسة|ستة|سبعة|ثمانية|تسعة|صفر|عشرة)\s*?){4}/

回答2:

convert both pattern and $str to windows-1256, do the matching, then convert $matches items back (if needed), this is the solution I came to after suffering for some time.

$pattern="/\p{L}\b(?:(?:واحد|اثنان|ثلاثة|أربعة|خمسة|ستة|سبعة|ثمانية|تسعة|صفر|عشرة)\b\s*?){4}/";
$pattern_windows1265 = iconv('utf-8', 'windows-1256', $pattern);
$str_windows1265 = iconv('utf-8', 'windows-1256', $str);
if (preg_match($pattern_windows1265, $str_windows1265, $matches) > 0) 
   return true;

Here's a test example to check if unicode conversion is allowing Arabic letters match in preg_match:

<?php
$pattern="/(واحد|اثنان|ثلاثة|أربعة|خمسة|ستة|سبعة|ثمانية|تسعة|صفر|عشرة)/";
$pattern_windows1265 = iconv('utf-8', 'windows-1256', $pattern);


$test_cases=array(
    'لدي أربعة أولاد',
    'قفز الثعلب فوق الشجرة',
    'عندي خمسة أرانب',
);
foreach ($test_cases as $str) {
    $str_windows1265 = iconv('utf-8', 'windows-1256', $str);

    if (preg_match($pattern_windows1265, $str_windows1265, $matches) > 0) {
        echo $str, '<br />';
    }
}

when executing, it will output:

لدي أربعة أولاد
لدي خمسة أرانب

I removed some of the pattern to check if the plain check against Arabic works, which seems to be working.

回答3:

You can use the pattern modifier u to use any UTF-8 supported language.

if (preg_match("/\p{L}\b(?:(?:واحد|اثنان|ثلاثة|أربعة|خمسة|ستة|سبعة|ثمانية|تسعة|صفر|عشرة)\b\s*?){4}/u", $str, $matches) > 0)

Resources :

Pattern modifiers

regular expression to detect numbers written as wo

问题:

回答1:

回答2:

回答3:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮