I'm having troubles with (german) special characters in URIs and want to try to resolve it with a RegEx Route and a PCRE pattern modifier for UTF-8 u
.
'router' => array(
'routes' => array(
// ...
'city' => array(
'type' => 'regex',
'options' => array(
'regex' => '/catalog/(?<city>[a-zA-Z0-9_-äöüÄÖÜß]*)\/u',
'defaults' => array(
'controller' => 'Catalog\Controller\Catalog',
'action' => 'list-sports',
),
'spec' => '/catalog/%city%',
),
'may_terminate' => true,
),
),
),
But when I set it, the route stopps to work at all (error 404) -- neither for URIs with nor to ones without special characters.
How to set the modifier correctly?
Since I already had this open here's a handler that solves the problem.
<?php
namespace Application\Mvc\Router\Http;
use Zend\Mvc\Router\Http\Regex;
use Zend\Mvc\Router\Http\RouteMatch;
use Zend\Stdlib\RequestInterface as Request;
class UnicodeRegex extends Regex
{
/**
* match(): defined by RouteInterface interface.
*
* @param Request $request
* @param integer $pathOffset
* @return RouteMatch
*/
public function match(Request $request, $pathOffset = null)
{
if (!method_exists($request, 'getUri')) {
return null;
}
$uri = $request->getUri();
// path decoded before match
$path = rawurldecode($uri->getPath());
// regex with u modifier
if ($pathOffset !== null) {
$result = preg_match('(\G' . $this->regex . ')u', $path, $matches, null, $pathOffset);
} else {
$result = preg_match('(^' . $this->regex . '$)u', $path, $matches);
}
if (!$result) {
return null;
}
$matchedLength = strlen($matches[0]);
foreach ($matches as $key => $value) {
if (is_numeric($key) || is_int($key) || $value === '') {
unset($matches[$key]);
} else {
$matches[$key] = $value;
}
}
return new RouteMatch(array_merge($this->defaults, $matches), $matchedLength);
}
}
Assuming you place the file in Application/Mvc/Router/Http/UnicodeRegex
your route definition should look like this
'router' => array(
'routes' => array(
// ...
'city' => array(
'type' => 'Application\Mvc\Router\Http\UnicodeRegex',
'options' => array(
'regex' => '/catalog/(?<city>[\p{L}]+)',
// or if you prefer, your original regex should work too
// 'regex' => '/catalog/(?<city>[a-zA-Z0-9_-äöüÄÖÜß]*)',
'defaults' => array(
'controller' => 'Catalog\Controller\Catalog',
'action' => 'list-sports',
),
'spec' => '/catalog/%city%',
),
'may_terminate' => true,
),
),
),
Well,
I guess you can solve it as easily as many other ones had this same problem. So take a look at some of them:
UTF-8 in * regular expressions
There uses the following modifiers like \\s
, \\p{L}
, and \\u
to help you. I hope it solves! Good luck.
Edit
See my own test:
<?php
$toss_the_dice = utf8_decode ("etc/catalog/Nürnberg");
preg_match ('/\/catalog\/([\\s\\p{L}]*)/m', $toss_the_dice, $dice);
echo utf8_encode ($dice[1]);
// Now it prints
// Nürnberg
?>
Can you realize?
Edit 2
It can be better for you!
<?php
$toss_the_dice = "etc/catalog/Nürnberg";
preg_match ('/\/catalog\/([\\s\\p{L}]*)/u', $toss_the_dice, $dice);
echo $dice[1];
// Now it also prints
// Nürnberg
?>