What characters are valid in a URL? [duplicate]

2019-01-03 10:12发布

Possible Duplicate:
Which characters make a url invalid?

I'm trying to remove the non-URL part of a big string. Most of the regexes I found are like [A-Za-z0-9-_.!~*'()], but there are more things that can a url contain. Like http://127.0.0.1:8080/test?v=123#this for example

So what are the latest characters for a valid URL?

EDIT:

They seem to be:

A-Za-z0-9-._~:/?#[]@!$&'()*+,;= and % followed by hex value

标签: html url
1条回答
Summer. ? 凉城
2楼-- · 2019-01-03 10:49

All the gory details can be found in the current RFC on the topic: RFC 3986 (Uniform Resource Identifier (URI): Generic Syntax)

Based on this related answer, you are looking at a list that looks like: A-Z, a-z, 0-9, -, ., _, ~, :, /, ?, #, [, ], @, !, $, &, ', (, ), *, +, ,, ;, and =. Everything else must be url-encoded. Also, some of these characters can only exist in very specific spots in a URI, the RFC has all of these specifics.

查看更多
登录 后发表回答