Anyone want to try their hand at coming up with a regex that matches both:
- /foo/bar/baz.gif
- /foo/bar/
- http://www.foo.com/foo/bar
I think it might be impossible to do it with one regex, but you never know.
EDIT: To clarify, what I'm trying to do is pick out all URI's from a document (Not a HTML document).
Similar to Alex's.
Rationale for this answer:
Caveats:
Edit: whoops, fixed closing paren problem.
matches these, but maybe you had stricter conditions in mind?
That's a tricky one because there are so many valid characters in URL's (before they get url encoded).
Here's my shot:
Also similar to Alex's. The only problem I found with Alex's is that it wouldn't match things like pound signs, dashes, stuff like that. Whereas mine will match all of that.
EDIT -- In fact the only thing that keeps it from being too greedy is the instruction to NOT match whitespace, quotes, apostrophes, or chevrons.
I used naming capture groups. We get better matches when the scheme is present. Like www.foo.com/bar would only match /bar.
This is what you could do for javascript
Test data
Not easy and you maybe end up having "too much URI" catched, however what about:
Basically you have a couple of groups there. On defining the protocol. One is looking for the directory and one is looking for a file at the end. But! this approach is very limited. If you need a real URI validation and! separation (port, username, password, filter out unwanted characters!) you will probably end up with a way more complex expression. Good luck!
Update:
You didn't asked for this, however for those guys coming from search engines wanting to learn more about regex I would like to plug this free program I used for this attempt "The Regex Coach" (Nope, not affiliated).