How can I preg_replace special character like '

2019-01-28 18:32发布

There are heaps of Qs about this on this forum and on the web in general. But I don't just get it.

Here is my code:

function updateGuideKeywords($dal)
{
    $pattern = "/[^a-zA-Z-êàé]/";
    $keywords = preg_replace($pattern, '', $_POST['keywords']);
    echo json_encode($keywords);
}

Now, the input is Prêt-à-porter, and the output is "Pr\u00eat-\u00e0-porter".

Why do I get the '\u00e' ?

And how can I alter my pattern to include the characters ê, à and é ?

EDIT
humm... since it looks like a unicode / character issue, I might go for the solution I found on this page.

Here they suggest doing something like this:

$chain="prêt-à-porter";

$pattern = array("'é'", "'è'", "'ë'", "'ê'", "'É'", "'È'", "'Ë'", "'Ê'", "'á'", "'à'", "'ä'", "'â'", "'å'", "'Á'", "'À'", "'Ä'", "'Â'", "'Å'", "'ó'", "'ò'", "'ö'", "'ô'", "'Ó'", "'Ò'", "'Ö'", "'Ô'", "'í'", "'ì'", "'ï'", "'î'", "'Í'", "'Ì'", "'Ï'", "'Î'", "'ú'", "'ù'", "'ü'", "'û'", "'Ú'", "'Ù'", "'Ü'", "'Û'", "'ý'", "'ÿ'", "'Ý'", "'ø'", "'Ø'", "'œ'", "'Œ'", "'Æ'", "'ç'", "'Ç'");

$replace = array('e', 'e', 'e', 'e', 'E', 'E', 'E', 'E', 'a', 'a', 'a', 'a', 'a', 'A', 'A', 'A', 'A', 'A', 'o', 'o', 'o', 'o', 'O', 'O', 'O', 'O', 'i', 'i', 'i', 'I', 'I', 'I', 'I', 'I', 'u', 'u', 'u', 'u', 'U', 'U', 'U', 'U', 'y', 'y', 'Y', 'o', 'O', 'a', 'A', 'A', 'c', 'C'); 

$chain = preg_replace($pattern, $replace, $chain);

EDIT 2
This is my solution so far:

function updateGuideKeywords()
{
    //First we replace characters with accents
    $pattern = array("'é'", "'è'", "'ë'", "'ê'", "'É'", "'È'", "'Ë'", "'Ê'", "'á'", "'à'", "'ä'", "'â'", "'å'", "'Á'", "'À'", "'Ä'", "'Â'", "'Å'", "'ó'", "'ò'", "'ö'", "'ô'", "'Ó'", "'Ò'", "'Ö'", "'Ô'", "'í'", "'ì'", "'ï'", "'î'", "'Í'", "'Ì'", "'Ï'", "'Î'", "'ú'", "'ù'", "'ü'", "'û'", "'Ú'", "'Ù'", "'Ü'", "'Û'", "'ý'", "'ÿ'", "'Ý'", "'ø'", "'Ø'", "'œ'", "'Œ'", "'Æ'", "'ç'", "'Ç'");
    $replace = array('e', 'e', 'e', 'e', 'E', 'E', 'E', 'E', 'a', 'a', 'a', 'a', 'a', 'A', 'A', 'A', 'A', 'A', 'o', 'o', 'o', 'o', 'O', 'O', 'O', 'O', 'i', 'i', 'i', 'I', 'I', 'I', 'I', 'I', 'u', 'u', 'u', 'u', 'U', 'U', 'U', 'U', 'y', 'y', 'Y', 'o', 'O', 'a', 'A', 'A', 'c', 'C');        $shguideID = $_POST['shguideID'];
    $keywords = preg_replace($pattern, $replace, $_POST['keywords']);
    //Then we remove unwanted characters by only allowing a-z, A-Z, comma, 'minus' and white space
    $keywords = preg_replace("/[^a-zA-Z-,\s]/", "", $keywords);

    echo json_encode($keywords);
}

6条回答
时光不老,我们不散
2楼-- · 2019-01-28 18:51

Your code, with the latest edits so far, works this way:

  1. The expression /[^a-zA-Z-êàé]/ means "match anything that's not English letter, minus sign, ê, à or é".

  2. preg_replace($pattern, '', 'Prêt-à-porter') returns 'Prêt-à-porter' since nothing matches.

  3. json_encode() returns the JSON representation of 'Prêt-à-porter', which is 'r\u00eat-\u00e0-porter'

It's not clear to me what's your exact goal. If you want to remove anything that's not a minus or letter you can try this pattern:

/[^\w0-9]/u
查看更多
forever°为你锁心
3楼-- · 2019-01-28 18:53

this may not be 100% accurate, but looking at the regex your using i don't think preg_replace() is the issue. I think the reason you are getting '\u00e' is due to php's poor support of character encodings.

查看更多
Animai°情兽
4楼-- · 2019-01-28 19:00

You could also use mb_ereg_replace() to work with multibyte characters in your string.

查看更多
聊天终结者
5楼-- · 2019-01-28 19:02

"Pr\u00eat-\u00e0-porter" is a correct JavaScript string literal representation of Prêt-à-porter. I assume you're doing a json_encode at some point along the line?

Note also that PHP's regular expressions are not Unicode-aware, so if you are using UTF-8 (which generally you want to be), the character ê is not a single character, but byte C3 followed by byte AA. That's fine for simple literal matches, but in situations like a character class you're now matching two bytes separately instead of one after each other, which can easily mess up your expression.

查看更多
唯我独甜
6楼-- · 2019-01-28 19:09

If you want to replace 'é' with 'e', etc. use iconv() with the //TRANSLIT modifier

e.g.,

$newString = iconv('UTF-8', 'ASCII//TRANSLIT', $myString);

A more complete example:

$ cat scratch.php
<?php
$x = "Prêt-à-porter";
var_dump(json_encode(iconv("UTF-8", "ASCII//TRANSLIT", $x)));


$ php scratch.php
string(15) ""Pret-a-porter""
$ 
查看更多
姐就是有狂的资本
7楼-- · 2019-01-28 19:10

From what I see of your output, your characters are not removed (hence in your pattern), so the only thing is that the output is made in unicode. Try to change your document to UTF-8 or encode HTML entities and it should work, but beware if you encode entities before replacing, it won't detect them as they will be already converted.

查看更多
登录 后发表回答