I'm looking for a regex that will allow me to validate whether or not a string is the reference to a website address, or a specific page in that website.
So it would match:
http://google.com
ftp://google.com
http://google.com/
http://lots.of.subdomains.google.com
But not:
http://google.com/search.whatever
ftp://google.com/search.whatever
http://lots.of.subdomains.google.com/search.whatever
Any ideas? I can't quite figure out how to handle allowing the /
at the end of the URL.
Try this:
(http|ftp|https)://([a-zA-Z0-9\-\.]+)/?
This is a shortened version of my full URI validation pattern, based on the specification. I wrote this because the specification allows many characters never included in any validation pattern I've found on the web. You'll see that the user/pass (and in the second pattern, path and query string) are far more permissive than you'd have thought.
/^(https?|ftp):\/\/(?# protocol
)(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+(?# username
)(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?(?# password
)@)?(?# auth requires @
)((([a-z0-9][a-z0-9-]*[a-z0-9]\.)*(?# domain segments AND
)[a-z]{2}[a-z0-9-]*[a-z0-9](?# top level domain OR
)|(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5]\.){3}(?#
)(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])(?# IP address
))(:\d+)?(?# port
))\/?$/i
And since I've taken the time to break this out to be somewhat more readable, here is the complete pattern:
/^(https?|ftp):\/\/(?# protocol
)(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+(?# username
)(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?(?# password
)@)?(?# auth requires @
)((([a-z0-9][a-z0-9-]*[a-z0-9]\.)*(?# domain segments AND
)[a-z]{2}[a-z0-9-]*[a-z0-9](?# top level domain OR
)|(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5]\.){3}(?#
)(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])(?# IP address
))(:\d+)?(?# port
))(((\/+([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)*(?# path
)(\?([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)(?# query string
)?)?)?(?# path and query string optional
)(#([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)?(?# fragment
)$/i
Note that some (all?) javascript implementations do not support comments in regular expressions.
Great answer by Jeremy. Depending on which regex dialect you're using to match, you might want to wrap the whole expression with anchors (to avoid matching URLs like http://example.com/bin/cgi?returnUrl=http://google.com
), and maybe generalize the valid protocol and domain name characters:
^\w+://(\w+\.)+\w+/?$