Regex to strip out everything but words and number

Im trying to clean a post string used in an ajax request (sanitize before db query) to allow only alphanumeric characters, spaces (1 per word, not multiple), can contain "-", and latin characters like "ç" and "é" without success, can anyone help or point me on the right direction?

This is the regex I'm using so far:

$string = preg_replace('/^[a-z0-9 àáâãäåçèéêëìíîïðñòóôõöøùúû-]+$/', '', mb_strtolower(utf8_encode($_POST['q'])));

Thank you.

标签： php regex punctuation

3条回答

够拽才男人

2楼-- · 2019-09-17 11:13

$string = mb_strtolower(utf8_encode($_POST['q'])));
$string = preg_replace('/[^a-z0-9 àáâãäåçèéêëìíîïðñòóôõöøùúû-]+/g', '', $string);
$string = preg_replace('/ +/g', ' ', $string);

Why not just use mysql_real_escape_string?

0人赞添加讨论(0) 举报

仙女界的扛把子

3楼-- · 2019-09-17 11:21

$string = preg_replace('/[^a-z0-9 àáâãäåçèéêëìíîïðñòóôõöøùúû\-]/u', '', mb_strtolower(utf8_encode($_POST['q']), 'UTF-8'));
$string = preg_replace( '/ +/', ' ', $string );

should do the trick. Note that

the character class is negated by putting ^ inside the character class
you need the u flag when dealing with unicode strings either in the pattern or in the subject
it's better to specify the character set explicitly in mb_* functions because otherwise they will fall back on your system defaults, and that may not be UTF-8.
the hyphen character needed escaping (\- instead of - at the end of your character class)

0人赞添加讨论(0) 举报

放荡不羁爱自由

4楼-- · 2019-09-17 11:22

$regEx = '/^[^\w\p{L}-]+$/iu';

\w - matches alphanumerics

\p{L} - matches a single Unicode Code Point in the 'Letters' category (see the Unicode Categories section here).

- at the end of the character class matches a single hyphen.

^ in the character classes negates the character class, so that the regex will match the opposite of the character class (anything you do not specify).

+ outside of the character class says match 1 or more characters

^ and $ outside of the character class will cause the engine to only accept matches that start at the beginning of a line and goes until the end of the line.

After the pattern, the i modifier says ignore case and the u tells the pattern matching engine that we're going to be sending UTF8 data it's way, and g modifier originally present has been removed since it's not necessary in PHP (instead global matching is dependent on which matching function is called)

0人赞添加讨论(0) 举报

Regex to strip out everything but words and number

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间