I'm not too good in regexp but hoping someone could explain better to me, I found this in the code that I debug. I wonder why I always got false on this scenario.
I know \p{L}
matches a single code point in the category "letter". 0-9
is numeric.
$regExp = /^\s*
(?P([0-2]?[1-9]|[12]0|3[01]))\s+
(?P\p{L}+?)\s+
(?P[12]\d{3})\s*$/i;
$value = '12 Février 2015' ;
$matches = array();
$match = preg_match($regExp, $value, $matches);
Additional information, I have come up with this:
$match = preg_match("/^\s*(?P<monthDay>([0-2]?[1-9]|[12]0|3[01]))\s+(?P<monthNameFull>\p{L}+?)\s+(?P<yearFull>[12]\d{3})\s*$/i", "18 Février 2015");
var_dump($match); //It will print int(0).
But if the value is 18 February 2015
, it will print int(1). Why is that so? It is suppose to return 1 in both values because \p{L}
will accept unicode characters.
$regExp = '/^\s*(?P<y>([0-2]?[1-9]|[12]0|3[01]))\s+(?P<m>\p{L}+?)\s+(?P<d>[12]\d{3})\s*$/usD';
$value = '12 Février 2015';
$matches = array();
$match = preg_match($regExp, $value, $matches);
var_dump($matches);
You always have to use <name>
with the (?P
unless you want an error... And by unicode multiline strings you need the usD
flags. It is easy to remember, its like USA dollar...
No named groups are needed, and the syntax for them seems to be wrong anyway. So this cleaned-up version should work:
/^
\s*([0-2]?[1-9]|[12]0|3[01])\s+
\p{L}+?\s+
[12]\d{3}\s*
$/i
The pattern for the day of the month would also be more intelligible as:
(0?[1-9]|[12][0-9]|3[01])
Figured out a fix, use /u instead of /i.
$match = preg_match("/^\s*(?P<monthDay>([0-2]?[1-9]|[12]0|3[01]))\s+(?P<monthNameFull>\p{L}+?)\s+(?P<yearFull>[12]\d{3})\s*$/u", "18 Février 2015");
var_dump($match); //It will print int(1).
Thanks all for all the help
Use the u
modifier for unicode:
$regExp = /^\s*
(?P<monthDay>([0-2]?[1-9]|[12]0|3[01]))\s+
(?P<monthNameFull>\p{L}+?)\s+
(?P<yearFull>[12]\d{3})\s*$/u;
// here __^
The i
modifier is not mandatory, \p{L}
is case insensitive.