PHP preg_replace() pattern, string sanitization

2019-07-27 03:08发布

I have a regex email pattern and would like to strip all but pattern-matched characters from the string, in a short I want to sanitize string...

I'm not a regex guru, so what I'm missing in regex?

<?php

$pattern = "/^([\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+\.)*[\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+@((((([a-z0-9]{1}[a-z0-9\-]{0,62}[a-z0-9]{1})|[a-z])\.)+[a-z]{2,6})|(\d{1,3}\.){3}\d{1,3}(\:\d{1,5})?)$/i";

$email = 'contact<>@domain.com'; // wrong email

$sanitized_email = preg_replace($pattern, NULL, $email);

echo $sanitized_email; // Should be contact@domain.com

?>

Pattern taken from: http://fightingforalostcause.net/misc/2006/compare-email-regex.php (the very first one...)

2条回答
在下西门庆
2楼-- · 2019-07-27 03:59

You cannot filter and match at the same time. You'll need to break it up into a character class for stripping invalid characters and a matching regular expression which verifies a valid address.

$email = preg_replace($filter, "", $email);
if (preg_match($verify, $email)) {
     // ok, sanitized
     return $email;
}

For the first case, you want to use a negated character class /[^allowedchars]/.
For the second part you use the structure /^...@...$/.

Have a look at PHPs filter extension. It uses const unsigned char allowed_list[] = LOWALPHA HIALPHA DIGIT "!#$%&'*+-=?^_\{|}~@.[]";` for cleansing.

And there is the monster for validation: line 525 in http://gcov.php.net/PHP_5_3/lcov_html/filter/logical_filters.c.gcov.php - but check out http://www.regular-expressions.info/email.html for a more common and shorter variant.

查看更多
淡お忘
3楼-- · 2019-07-27 04:11

i guess filter_var php function can also do this functionality, and in a cleaner way. Have a look at: http://www.php.net/manual/en/function.filter-var.php

example:

 $email = "chris@exam\\ple.com";
 $cleanEmail = filter_var($email, FILTER_SANITIZE_EMAIL);  // chris@example.com
查看更多
登录 后发表回答