I am using Python to extract the filename from a link using rfind like below:
url = "http://www.google.com/test.php"
print url[url.rfind("/") +1 : ]
This works ok with links without a / at the end of them and returns "test.php". I have encountered links with / at the end like so "http://www.google.com/test.php/". I am have trouble getting the page name when there is a "/" at the end, can anyone help?
Cheers
Just removing the slash at the end won't work, as you can probably have a URL that looks like this:
http://www.google.com/test.php?filepath=tests/hey.xml
...in which case you'll get back "hey.xml". Instead of manually checking for this, you can use urlparse to get rid of the parameters, then do the check other people suggested:
from urlparse import urlparse
url = "http://www.google.com/test.php?something=heyharr/sir/a.txt"
f = urlparse(url)[2].rstrip("/")
print f[f.rfind("/")+1:]
Use [r]strip to remove trailing slashes:
url.rstrip('/').rsplit('/', 1)[-1]
If a wider range of possible URLs is possible, including URLs with ?queries, #anchors or without a path, do it properly with urlparse:
path= urlparse.urlparse(url).path
return path.rstrip('/').rsplit('/', 1)[-1] or '(root path)'
Filenames with a slash at the end are technically still path definitions and indicate that the index file is to be read. If you actually have one that' ends in test.php/
, I would consider that an error. In any case, you can strip the / from the end before running your code as follows:
url = url.rstrip('/')
There is a library called urlparse that will parse the url for you, but still doesn't remove the / at the end so one of the above will be the best option
Just for fun, you can use a Regexp:
import re
print re.search('/([^/]+)/?$', url).group(1)
You could use
print url[url.rstrip("/").rfind("/") +1 : ]
filter(None, url.split('/'))[-1]
(But urlparse is probably more readable, even if more verbose.)