What two separator characters would work in a URL

2019-02-21 13:17发布

问题:

I use anchors in my URLs, allowing people to bookmark 'active pages' in a web application. I used anchors because they fit easily within the GWT history mechanism.

My existing implementation encodes navigation and data information into the anchor, separated by the '-' character. I.e. creating anchors like #location-location-key-value-key-value

Other than the fact that negative values (like -1) cause serious parsing problems, it works, but now I've found that having two separator characters would be better. Also, givin the negative number issue, I'd like to ditch using '-'.

What other characters work in a URL anchor that won't interfere with the URL or its GET params? How stable will these be in the future?

回答1:

Looking at the RFC for URLs, section 3.5 a fragment identifier (which I believe you're referring to) is defined as

fragment    = *( pchar / "/" / "?" )

and from Appendix A

pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

Interestingly, the spec also says that

"The characters slash ("/") and question mark ("?") are allowed to represent data within the fragment identifier."

So it appears that real anchors, like

<a href="#name?a=1&b=2">
....
<a name="name?a=1&b=2">

are supposed to be legal, and is very much like the normal URL query string. (A quick check verified that these do work correctly in at least chrome, firefox and ie) Since this works, I'm assuming you can use your method to have URLs like

http://www.site.com/foo.html?real=1&parameters=2#fake=2&parameters=3

with no problem (e.g. the 'parameters' variable in the fragment shouldn't interfere with the one in the query string)

You can also use percent encoding when necessary... and there are many other characters defined in sub-delims that could be usable.

NOTE:

Also from the spec:

"A fragment identifier component is indicated by the presence of a number sign ("#") character and terminated by the end of the URI."

So everything after the # is the fragment identifier, and should not interfere with GET parameters.