Which characters make a URL invalid?-第2页回答

2楼-- · 2018-12-31 01:33

Not really an answer to your question but validating url's is really a serious p.i.t.a You're probably just better off validating the domainname and leave query part of the url be. That is my experience. You could also resort to pinging the url and seeing if it results in a valid response but that might be too much for such a simple task.

Regular expressions to detect url's are abundant, google it :)

0人赞添加讨论(0) 举报

千与千寻千般痛.

3楼-- · 2018-12-31 01:35

All valid characters that can be used in a URI (a URL is a type of URI) are defined in RFC 3986.

All other characters can be used in a URL provided that they are "URL Encoded" first. This involves changing the invalid character for specific "codes" (usually in the form of the percent symbol (%) followed by a hexadecimal number).

This link, HTML URL Encoding Reference, contains a list of the encodings for invalid characters.

0人赞添加讨论(0) 举报

姐姐魅力值爆表

4楼-- · 2018-12-31 01:38

Several of Unicode character ranges are valid HTML5, although it might still not be a good idea to use them.

E.g., href docs say http://www.w3.org/TR/html5/links.html#attr-hyperlink-href:

The href attribute on a and area elements must have a value that is a valid URL potentially surrounded by spaces.

Then the definition of "valid URL" points to http://url.spec.whatwg.org/, which says it aims to:

Align RFC 3986 and RFC 3987 with contemporary implementations and obsolete them in the process.

That document defines URL code points as:

ASCII alphanumeric, "!", "$", "&", "'", "(", ")", "*", "+", ",", "-", ".", "/", ":", ";", "=", "?", "@", "_", "~", and code points in the ranges U+00A0 to U+D7FF, U+E000 to U+FDCF, U+FDF0 to U+FFFD, U+10000 to U+1FFFD, U+20000 to U+2FFFD, U+30000 to U+3FFFD, U+40000 to U+4FFFD, U+50000 to U+5FFFD, U+60000 to U+6FFFD, U+70000 to U+7FFFD, U+80000 to U+8FFFD, U+90000 to U+9FFFD, U+A0000 to U+AFFFD, U+B0000 to U+BFFFD, U+C0000 to U+CFFFD, U+D0000 to U+DFFFD, U+E1000 to U+EFFFD, U+F0000 to U+FFFFD, U+100000 to U+10FFFD.

The term "URL code points" is then used in the statement:

If c is not a URL code point and not "%", parse error.

in a several parts of the parsing algorithm, including the schema, authority, relative path, query and fragment states: so basically the entire URL.

Also, the validator http://validator.w3.org/ passes for URLs like "你好", and does not pass for URLs with characters like spaces "a b"

Of course, as mentioned by Stephen C, it is not just about characters but also about context: you have to understand the entire algorithm. But since class "URL code points" is used on key points of the algorithm, it that gives a good idea of what you can use or not.

See also: Unicode characters in URLs

0人赞添加讨论(0) 举报

牵手、夕阳

5楼-- · 2018-12-31 01:39

In your supplementary question you asked if www.example.com/file[/].html is a valid URL.

That URL isn't valid because a URL is a type of URI and a valid URI must have a scheme like http: (see RFC 3986).

If you meant to ask if http://www.example.com/file[/].html is a valid URL then the answer is still no because the square bracket characters aren't valid there.

The square bracket characters are reserved for URLs in this format: http://[2001:db8:85a3::8a2e:370:7334]/foo/bar (i.e. an IPv6 literal instead of a host name)

It's worth reading RFC 3986 carefully if you want to understand the issue fully.

0人赞添加讨论(0) 举报

Which characters make a URL invalid?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间