How do browsers determine whether an URL in an hre

2020-07-18 09:11发布

问题:

Suppose I have the following link tag: <a href="tel:+15555555">Phone number</a>.

How exactly does the browser know not to load the relative location ./tel:+15555555 from the current server and instead know that tel is supposed to be interpreted as a scheme?

Detecting host-relative URLs (/…) or protocol-relative URLs (//…) seems to be trivial. I guess HTTP-URLs (http://… or https://…) would be simple to special-case as well. But how does the browser go about parsing an URL with an arbitrary scheme? I know a valid scheme has to start with a lowercase letter and may only contain lowercase letters or the characters +, - and ., which limits the scope somewhat… Of course I’m aware that the whole issue only pertains to scopes where relative URLs are valid (i.e. mostly the href and src attributes).

I’m looking for the links to some RFC (e.g. which forbids non-URL-encoded colons to be anything but scheme separators) as well as to the source code of various browser’s URL parsing internals.

回答1:

The href value is parsed as a URI (see RFC 3986). As a result of the parsing, the browser will know that this was an absolute URI, not a relative reference.

As a matter of fact, unescaped ":" is allowed in the path component; it's just that they need to occur after the first "/"; otherwise they could be parsed as scheme delimiter if the preceding characters are all valid scheme name characters.

See http://greenbytes.de/tech/webdav/rfc3986.html#path

The RFC also has the following to say in section 4.2 (titled “Relative Reference”): “A path segment that contains a colon character (e.g., "this:that") cannot be used as the first segment of a relative-path reference, as it would be mistaken for a scheme name. Such a segment must be preceded by a dot-segment (e.g., "./this:that") to make a relative-path reference.” (emphasis added).



回答2:

See RFC 3966 for the tel URI specification, and RFC 3986 for the more generic URL specification. It's the colon (:) that separates scheme from the "hier part".