How do I get specific path sections from a url? For example, I want a function which operates on this:
http://www.mydomain.com/hithere?image=2934
and returns "hithere"
or operates on this:
http://www.mydomain.com/hithere/something/else
and returns the same thing ("hithere")
I know this will probably use urllib or urllib2 but I can't figure out from the docs how to get only a section of the path.
A combination of urlparse and os.path.split will do the trick. The following script stores all sections of a url in a list, backwards.
This would return: ["else", "something", "hithere"]
Extract the path component of the URL with urlparse:
Split the path into components with os.path.split:
The dirname and basename functions give you the two pieces of the split; perhaps use dirname in a while loop:
Note in Python3 import has changed to
from urllib.parse import urlparse
See documentation. Here is an example:Python 3.4+ solution:
The best option is to use the
posixpath
module when working with the path component of URLs. This module has the same interface asos.path
and consistently operates on POSIX paths when used on POSIX and Windows NT based platforms.Sample Code:
Code output:
Notes:
os.path
isntpath
os.path
isposixpath
ntpath
will not handle backslashes (\
) correctly (see last two cases in code/output) - which is whyposixpath
is recommended.urllib.parse.unquote
posixpath.normpath
/
) is not defined by RFC 3986. However,posixpath
collapses multiple adjacent path separators (i.e. it treats///
,//
and/
the same)Normative References: