urlencode vs rawurlencode?

2019-01-01 07:32发布

If I want to create a URL using a variable I have two choices to encode the string. urlencode() and rawurlencode().

What exactly are the differences and which is preferred?

11条回答
笑指拈花
2楼-- · 2019-01-01 08:12

urlencode: This differs from the » RFC 1738 encoding (see rawurlencode()) in that for historical reasons, spaces are encoded as plus (+) signs.

查看更多
听够珍惜
3楼-- · 2019-01-01 08:17

I believe urlencode is for query parameters, whereas the rawurlencode is for the path segments. This is mainly due to %20 for path segments vs + for query parameters. See this answer which talks about the spaces: When to encode space to plus (+) or %20?

However %20 now works in query parameters as well, which is why rawurlencode is always safer. However the plus sign tends to be used where user experience of editing and readability of query parameters matter.

Note that this means rawurldecode does not decode + into spaces (http://au2.php.net/manual/en/function.rawurldecode.php). This is why the $_GET is always automatically passed through urldecode, which means that + and %20 are both decoded into spaces.

If you want the encoding and decoding to be consistent between inputs and outputs and you have selected to always use + and not %20 for query parameters, then urlencode is fine for query parameters (key and value).

The conclusion is:

Path Segments - always use rawurlencode/rawurldecode

Query Parameters - for decoding always use urldecode (done automatically), for encoding, both rawurlencode or urlencode is fine, just choose one to be consistent, especially when comparing URLs.

查看更多
步步皆殇っ
4楼-- · 2019-01-01 08:19

It will depend on your purpose. If interoperability with other systems is important then it seems rawurlencode is the way to go. The one exception is legacy systems which expect the query string to follow form-encoding style of spaces encoded as + instead of %20 (in which case you need urlencode).

rawurlencode follows RFC 1738 prior to PHP 5.3.0 and RFC 3986 afterwards (see http://us2.php.net/manual/en/function.rawurlencode.php)

Returns a string in which all non-alphanumeric characters except -_.~ have been replaced with a percent (%) sign followed by two hex digits. This is the encoding described in » RFC 3986 for protecting literal characters from being interpreted as special URL delimiters, and for protecting URLs from being mangled by transmission media with character conversions (like some email systems).

Note on RFC 3986 vs 1738. rawurlencode prior to php 5.3 encoded the tilde character (~) according to RFC 1738. As of PHP 5.3, however, rawurlencode follows RFC 3986 which does not require encoding tilde characters.

urlencode encodes spaces as plus signs (not as %20 as done in rawurlencode)(see http://us2.php.net/manual/en/function.urlencode.php)

Returns a string in which all non-alphanumeric characters except -_. have been replaced with a percent (%) sign followed by two hex digits and spaces encoded as plus (+) signs. It is encoded the same way that the posted data from a WWW form is encoded, that is the same way as in application/x-www-form-urlencoded media type. This differs from the » RFC 3986 encoding (see rawurlencode()) in that for historical reasons, spaces are encoded as plus (+) signs.

This corresponds to the definition for application/x-www-form-urlencoded in RFC 1866.

Additional Reading:

You may also want to see the discussion at http://bytes.com/groups/php/5624-urlencode-vs-rawurlencode.

Also, RFC 2396 is worth a look. RFC 2396 defines valid URI syntax. The main part we're interested in is from 3.4 Query Component:

Within a query component, the characters ";", "/", "?", ":", "@",
"&", "=", "+", ",", and "$"
are reserved.

As you can see, the + is a reserved character in the query string and thus would need to be encoded as per RFC 3986 (as in rawurlencode).

查看更多
永恒的永恒
5楼-- · 2019-01-01 08:20

simple * rawurlencode the path - path is the part before the "?" - spaces must be encoded as %20 * urlencode the query string - Query string is the part after the "?" -spaces are better encoded as "+" = rawurlencode is more compatible generally

查看更多
千与千寻千般痛.
6楼-- · 2019-01-01 08:23

1. What exactly are the differences and

The only difference is in the way spaces are treated:

urlencode - based on legacy implementation converts spaces to +

rawurlencode - based on RFC 1738 translates spaces to %20

The reason for the difference is because + is reserved and valid (unencoded) in urls.

2. which is preferred?

I'd really like to see some reasons for choosing one over the other ... I want to be able to just pick one and use it forever with the least fuss.

Fair enough, I have a simple strategy that I follow when making these decisions which I will share with you in the hope that it may help.

I think it was the HTTP/1.1 specification RFC 2616 which called for "Tolerant applications"

Clients SHOULD be tolerant in parsing the Status-Line and servers tolerant when parsing the Request-Line.

When faced with questions like these the best strategy is always to consume as much as possible and produce what is standards compliant.

So my advice is to use rawurlencode to produce standards compliant RFC 1738 encoded strings and use urldecode to be backward compatible and accomodate anything you may come across to consume.

Now you could just take my word for it but lets prove it shall we...

php > $url = <<<'EOD'
<<< > "Which, % of Alice's tasks saw $s @ earnings?"
<<< > EOD;
php > echo $url, PHP_EOL;
"Which, % of Alice's tasks saw $s @ earnings?"
php > echo urlencode($url), PHP_EOL;
%22Which%2C+%25+of+Alice%27s+tasks+saw+%24s+%40+earnings%3F%22
php > echo rawurlencode($url), PHP_EOL;
%22Which%2C%20%25%20of%20Alice%27s%20tasks%20saw%20%24s%20%40%20earnings%3F%22
php > echo rawurldecode(urlencode($url)), PHP_EOL;
"Which,+%+of+Alice's+tasks+saw+$s+@+earnings?"
php > // oops that's not right???
php > echo urldecode(rawurlencode($url)), PHP_EOL;
"Which, % of Alice's tasks saw $s @ earnings?"
php > // now that's more like it

It would appear that PHP had exactly this in mind, even though I've never come across anyone refusing either of the two formats, I cant think of a better strategy to adopt as your defacto strategy, can you?

nJoy!

查看更多
梦醉为红颜
7楼-- · 2019-01-01 08:26

I believe spaces must be encoded as:

  • %20 when used inside URL path component
  • + when used inside URL query string component or form data (see 17.13.4 Form content types)

The following example shows the correct use of rawurlencode and urlencode:

echo "http://example.com"
    . "/category/" . rawurlencode("latest songs")
    . "/search?q=" . urlencode("lady gaga");

Output:

http://example.com/category/latest%20songs/search?q=lady+gaga

What happens if you encode path and query string components the other way round? For the following example:

http://example.com/category/latest+songs/search?q=lady%20gaga
  • The webserver will look for the directory latest+songs instead of latest songs
  • The query string parameter q will contain lady gaga
查看更多
登录 后发表回答