The RFC 3986 URI: Generic Syntax spec lists a semicolon as a reserved (sub-delim) character:
reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
What is the reserved purpose of the ";" of the semicolon in URIs? For that matter, what is the purpose of the other sub-delims (I'm only aware of purposes for "&", "+", and "=")?
The intent is clearer if you go back to older versions of the specification:
I believe it has its origins in FTP URIs.
Since 2014 path segments are known to contribute to Reflected File Download attacks. Let's assume we have a vulnerable API that reflects whatever we send to it (the URL was real apparently, now fixed):
Now, this is harmless in a browser as it's JSON so it's not going to be rendered but the browser will rather offer to download the response as a file. Now here's the path segments come to help (for the attacker):
Everything between semicolons (
;/setup.bat;
) will be not sent to the web service, but instead the browser will interpret it as the file name... to save the API response. Now, a file calledsetup.bat
will be downloaded and run without asking about dangers of running files downloaded from Internet (because it contains the word"setup"
in its name). The contents will be interpreted as Windows batch file, and thecalc.exe
command will be run.Prevention:
Content-Disposition: attachment; filename="whatever.txt"
on APIs that are not going to be rendered; Google was missing thefilename
part which actually made the attack easierX-Content-Type-Options: nosniff
header to API responsesSection 3.3 covers this - it's an opaque delimiter a URI-producing application can use if convenient:
There is an explanation at the end of section 3.3.
In other words, it is reserved so that people who want a delimited list of something in the URL can safely use
;
as a delimiter even if the parts contain;
, as long as the contents are percent-encoded. In other words, you can do this:and interpret it as three parts:
foo
,bar
,baz;qux
. If semi-colon were not a reserved character, the;
and%3b
would be equivalent so the URI would be incorrectly interpreted as four parts:foo
,bar
,baz
,qux
.I found the following use-cases:
Its the final character of a HTML entity:
https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
Apache Tomcat 7 (or newer versions?!) us it as
path parameter
:https://superevr.com/blog/2011/three-semicolon-vulnerabilities
URI scheme splits by it the MIME and data:
https://en.wikipedia.org/wiki/Data_URI_scheme
And there was a bug in IIS5 and IIS6 to bypass file upload restrictions:
https://www.owasp.org/index.php/Unrestricted_File_Upload
Conclusion:
Do not use semicolons in URLs or they could accidentally produce a HTML entity or URI scheme.
There are some conventions around its current usage that are interesting. These speak to when to use a semicolon or comma. From the book "RESTful Web Services":