REGEXP returning false on special characters

2019-08-09 01:43发布

问题:

I'm not too good in regexp but hoping someone could explain better to me, I found this in the code that I debug. I wonder why I always got false on this scenario.

I know \p{L} matches a single code point in the category "letter". 0-9 is numeric.

$regExp = /^\s*
     (?P([0-2]?[1-9]|[12]0|3[01]))\s+
     (?P\p{L}+?)\s+
     (?P[12]\d{3})\s*$/i;

    $value = '12 Février 2015' ;
    $matches = array();

    $match = preg_match($regExp, $value, $matches);

Additional information, I have come up with this:

$match = preg_match("/^\s*(?P<monthDay>([0-2]?[1-9]|[12]0|3[01]))\s+(?P<monthNameFull>\p{L}+?)\s+(?P<yearFull>[12]\d{3})\s*$/i", "18 Février 2015");
var_dump($match); //It will print int(0).

But if the value is 18 February 2015, it will print int(1). Why is that so? It is suppose to return 1 in both values because \p{L} will accept unicode characters.

回答1:

$regExp = '/^\s*(?P<y>([0-2]?[1-9]|[12]0|3[01]))\s+(?P<m>\p{L}+?)\s+(?P<d>[12]\d{3})\s*$/usD';

$value = '12 Février 2015';
$matches = array();

$match = preg_match($regExp, $value, $matches);

var_dump($matches);

You always have to use <name> with the (?P unless you want an error... And by unicode multiline strings you need the usD flags. It is easy to remember, its like USA dollar...



回答2:

No named groups are needed, and the syntax for them seems to be wrong anyway. So this cleaned-up version should work:

/^ \s*([0-2]?[1-9]|[12]0|3[01])\s+ \p{L}+?\s+ [12]\d{3}\s* $/i

The pattern for the day of the month would also be more intelligible as:

(0?[1-9]|[12][0-9]|3[01])



回答3:

Figured out a fix, use /u instead of /i.

$match = preg_match("/^\s*(?P<monthDay>([0-2]?[1-9]|[12]0|3[01]))\s+(?P<monthNameFull>\p{L}+?)\s+(?P<yearFull>[12]\d{3})\s*$/u", "18 Février 2015");
var_dump($match); //It will print int(1).

Thanks all for all the help



回答4:

Use the u modifier for unicode:

$regExp = /^\s*
   (?P<monthDay>([0-2]?[1-9]|[12]0|3[01]))\s+
   (?P<monthNameFull>\p{L}+?)\s+
   (?P<yearFull>[12]\d{3})\s*$/u;
//                      here __^

The i modifier is not mandatory, \p{L} is case insensitive.



标签: php regex letter