Regex to validate the index of a website vs. a spe

2020-08-01 05:24发布

问题:

I'm looking for a regex that will allow me to validate whether or not a string is the reference to a website address, or a specific page in that website.

So it would match:

http://google.com
ftp://google.com
http://google.com/
http://lots.of.subdomains.google.com

But not:

http://google.com/search.whatever
ftp://google.com/search.whatever
http://lots.of.subdomains.google.com/search.whatever

Any ideas? I can't quite figure out how to handle allowing the / at the end of the URL.

回答1:

Try this:

(http|ftp|https)://([a-zA-Z0-9\-\.]+)/?


回答2:

This is a shortened version of my full URI validation pattern, based on the specification. I wrote this because the specification allows many characters never included in any validation pattern I've found on the web. You'll see that the user/pass (and in the second pattern, path and query string) are far more permissive than you'd have thought.

/^(https?|ftp):\/\/(?#                                      protocol
)(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+(?#         username
)(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?(?#      password
)@)?(?#                                                     auth requires @
)((([a-z0-9][a-z0-9-]*[a-z0-9]\.)*(?#                       domain segments AND
)[a-z]{2}[a-z0-9-]*[a-z0-9](?#                              top level domain OR
)|(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5]\.){3}(?#
    )(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])(?#             IP address
))(:\d+)?(?#                                                port
))\/?$/i

And since I've taken the time to break this out to be somewhat more readable, here is the complete pattern:

/^(https?|ftp):\/\/(?#                                      protocol
)(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+(?#         username
)(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?(?#      password
)@)?(?#                                                     auth requires @
)((([a-z0-9][a-z0-9-]*[a-z0-9]\.)*(?#                       domain segments AND
)[a-z]{2}[a-z0-9-]*[a-z0-9](?#                              top level domain OR
)|(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5]\.){3}(?#
    )(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])(?#             IP address
))(:\d+)?(?#                                                port
))(((\/+([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)*(?# path
)(\?([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)(?#      query string
)?)?)?(?#                                                   path and query string optional
)(#([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)?(?#      fragment
)$/i

Note that some (all?) javascript implementations do not support comments in regular expressions.



回答3:

Great answer by Jeremy. Depending on which regex dialect you're using to match, you might want to wrap the whole expression with anchors (to avoid matching URLs like http://example.com/bin/cgi?returnUrl=http://google.com), and maybe generalize the valid protocol and domain name characters:

^\w+://(\w+\.)+\w+/?$


标签: regex url