-->

Escaping escape Characters

2019-03-19 16:07发布

问题:

I'm trying to mimic the json_encode bitmask flags implemented in PHP 5.3.0, here is the string I have:

$s = addslashes('O\'Rei"lly'); // O\'Rei\"lly

Doing json_encode($s, JSON_HEX_APOS | JSON_HEX_QUOT) outputs the following:

"O\\\u0027Rei\\\u0022lly"

And I'm currently doing this in PHP versions older than 5.3.0:

str_replace(array('\\"', "\\'"), array('\\u0022', '\\\u0027'), json_encode($s))
or
str_replace(array('\\"', '\\\''), array('\\u0022', '\\\u0027'), json_encode($s))

Which correctly outputs the same result:

"O\\\u0027Rei\\\u0022lly"

I'm having trouble understanding why do I need to replace single quotes ('\\\'' or even "\\'" [surrounding quotes excluded]) with '\\\u0027' and not just '\\u0027'.


Here is the code that I'm having trouble porting to PHP < 5.3:

if (get_magic_quotes_gpc() && version_compare(PHP_VERSION, '6.0.0', '<'))
{
    /* JSON_HEX_APOS and JSON_HEX_QUOT are availiable */
    if (version_compare(PHP_VERSION, '5.3.0', '>=') === true)
    {
        $_GET = json_encode($_GET, JSON_HEX_APOS | JSON_HEX_QUOT);
        $_POST = json_encode($_POST, JSON_HEX_APOS | JSON_HEX_QUOT);
        $_COOKIE = json_encode($_COOKIE, JSON_HEX_APOS | JSON_HEX_QUOT);
        $_REQUEST = json_encode($_REQUEST, JSON_HEX_APOS | JSON_HEX_QUOT);
    }

    /* mimic the behaviour of JSON_HEX_APOS and JSON_HEX_QUOT */
    else if (extension_loaded('json') === true)
    {
        $_GET = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_GET));
        $_POST = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_POST));
        $_COOKIE = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_COOKIE));
        $_REQUEST = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_REQUEST));
    }

    $_GET = json_decode(stripslashes($_GET));
    $_POST = json_decode(stripslashes($_POST));
    $_COOKIE = json_decode(stripslashes($_COOKIE));
    $_REQUEST = json_decode(stripslashes($_REQUEST));
}

回答1:

The PHP string

'O\'Rei"lly'

is just PHP's way of getting the literal value

O'Rei"lly

into a string which can be used. Calling addslashes on that string changes it to be literally the following 11 characters

O\'Rei\"lly

i.e. strlen(addslashes('O\'Rei"lly')) == 11

This is the value which is being sent to json_escape.

In JSON backslash is an escape character, so that needs to be escaped, i.e.

\ to be \\

Also single and double quotes can cause problems. So converting them to their unicode equivalent in one way to avoid problems. So later verions of PHP's json_encode change

' to be \u0027

and

" to be \u0022

So applying these three rules to

O\'Rei\"lly

gives us

O\\\u0027Rei\\\u0022lly

This string is then wrapped in double quotes to make it a JSON string. Your replace expressions include the leading forward slashes. Either by accident or on purpose this means that the leading and trailing double quote returned by json_encode is not subject to the escaping, which it shouldn't be.

So in earlier versions of PHP

$s = addslashes('O\'Rei"lly');
print json_encode($s);

would print

"O\\'Rei\\\"lly"

and we want to change ' to be \u0027 and we want to change \" to be \u0022 because the \ in \" is just to get the " into the string because it begins and ends with double-quotes.

So that's why we get

"O\\\u0027Rei\\\u0022lly"


回答2:

It's escaping the backslash as well as the quote. It's difficult dealing with escaped escapes, as you're doing here, as it quickly turns into backslash counting games. :-/



回答3:

If I understand correctly, you just want to know why you need to use

'\\\u0027' and not just '\\u0027'

You're escaping the slash and the character unicode value. With this you are telling json that it should put an apostrophe there, but it needs the backslash and the u to know that a unicode hexadecimal character code is next.

Since you are escaping this string:

$s = addslashes('O\'Rei"lly'); // O\'Rei\"lly

the first backslash is actually escaping the backslash before the apostrophe. Then next slash is used to escape the backslash used by json to identify the character as a unicode character.

If you were appplying the algorythm to O'Reilly instead of O\'Rei\"lly then the latter would suffice.

I hope you find this useful. I only leave you this link so you can read more on how json is constructed, since its obvious you already understand PHP:

http://www.json.org/fatfree.html



回答4:

When you encode a string for json, some things have to be escaped regardless of the options. As others have pointed out, that includes '\' so any backslash run through json_encode will be doubled. Since you are first running your string through addslashes, which also adds backslashes to quotes, you are adding a lot of extra backslashes. The following function will emulate how json_encode would encode a string. If the string has already had backslashes added, they will be doubled.

function json_encode_string( $encode , $options ) {
    $escape = '\\\0..\37';
    $needle = array();
    $replace = array();

    if ( $options & JSON_HEX_APOS ) {
        $needle[] = "'";
        $replace[] = '\u0027';
    } else {
        $escape .= "'";
    }

    if ( $options & JSON_HEX_QUOT ) {
        $needle[] = '"';
        $replace[] = '\u0022';
    } else {
        $escape .= '"';
    }

    if ( $options & JSON_HEX_AMP ) {
        $needle[] = '&';
        $replace[] = '\u0026';
    }

    if ( $options & JSON_HEX_TAG ) {
        $needle[] = '<';
        $needle[] = '>';
        $replace[] = '\u003C';
        $replace[] = '\u003E';
    }

    $encode = addcslashes( $encode , $escape );
    $encode = str_replace( $needle , $replace , $encode );

    return $encode;
}


回答5:

Since you are going to json_encode the string \' you will have to encode first the \ then the '. So you will have \\ and \u0027. Concatenating these results \\\u0027.



回答6:

The \ generated by addslashes() get re-escaped by json_encode(). You probably meant to say this Doing json_encode($s, JSON_HEX_APOS | JSON_HEX_QUOT) outputs the following but you used $str instead of $s, which confused everyone.

If you evaluate the string "O\\\u0027Rei\\\u0022lly" in JavaScript, you will get "O\'rei\"lly" and I am pretty sure that's not what you want. When you evaluate it, you probably need all the control codes removed. Go ahead, poke this in a file: alert("O\\\u0027Rei\\\u0022lly").

Conclusion: You are escaping the quotes twice, which is most likely not what you need. json_encode already escapes everything that is needed so that any JavaScript parser would return the original data structure. In your case, that is the string you have obtained after the call to addslashes.


Proof:

<?php $out = json_encode(array(10, "h'ello", addslashes("h'ello re-escaped"))); ?>
<script type="text/javascript">
  var out = <?php echo $out; ?>;
  alert(out[0]);
  alert(out[1]);
  alert(out[2]);
</script>