Which characters are allowed in GET parameters without encoding or escaping them? I mean something like this:
http://www.example.org/page.php?name=XYZ
What can you have there instead of XYZ? I think only the following characters:
- a-z (A-Z)
- 0-9
- -
- _
Is this the full list or are there additional characters allowed?
I hope you can help me. Thanks in advance!
I did a test using the Chrome address bar and a
$QUERY_STRING
in bash, and observed the following:~!@$%^&*()-_=+[{]}\|;:',./?
andgrave (backtick)
are passed through as plaintext.,
"
,<
and>
are converted to%20
,%22
,%3C
and%3E
respectively.#
is ignored, since it is used by ye olde anchor.Personally, I'd say bite the bullet and encode with base64 :)
Alphanumeric characters and all of
~
-
_
.
!
*
'
(
)
,
are valid within an URL.
All other characters must be encoded.
All of the rules concerning the encoding of URIs (which contains URNs and URLs) are specified in the RFC1738 and the RFC3986, here's a TL;DR of these long and boring documents:
Percent-encoding, also known as URL encoding, is a mechanism for encoding information in a URI under certain circumstances. The characters allowed in a URI are either reserved or unreserved. Reserved characters are those characters that sometimes have special meaning, but they are not the only characters that needs encoding.
There are 66 unreserved characters that doesn't need any encoding:
abcdefghiklmopqrstuvwABCDEFGHIKLMOPQRSTUVWXYZ0123456789-_.~
There are 18 reserved characters which needs to be encoded:
!*'();:@&=+$,/?#[]
, and all the other characters must be encoded.To percent-encode a character, simply concatenate "%" and its ASCII value in hexadecimal. The php functions "urlencode" and "rawurlencode" do this job for you.
The question asks which characters are allowed in GET parameters without encoding or escaping them.
According to RFC3986 (general URL syntax) and RFC7230, section 2.7.1 (HTTP/S URL syntax) the only characters you need to percent-encode are those outside of outside the query set, see the definition below.
However, there are additional specifications like HTML5, Web forms, and the obsolete Indexed search, W3C recommendation. Those documents add a special meaning to some characters notably, to symbols like = & + ;.
Other answers here suggest that most of the reserved characters should be encoded, including "/" "?". That's not correct. In fact, RFC3986, section 3.4 advises against percent-encoding "/" "?" characters.
RFC3986 defines query component as:
The conclusion is that XYZ part should encode:
Unless special symbols = & ; are key=value separators.
Encoding other characters is allowed but not necessary.
There are reserved characters, that have a reserved meanings, those are delimiters —
:/?#[]@
— and subdelimiters —!$&'()*+,;=
There is also a set of characters called unreserved characters — alphanumerics and
-._~
— which are not to be encoded.That means, that anything that doesn't belong to unreserved characters set is supposed to be %-encoded, when they do not have special meaning (e.g. when passed as a part of
GET
parameter).See also RFC3986: Uniform Resource Identifier (URI): Generic Syntax
From RFC 1738 on which characters are allowed in URLs:
The reserved characters are ";", "/", "?", ":", "@", "=" and "&", which means you would need to URL encode them if you wish to use them.