Python Find Question

2019-07-20 18:02发布

问题:

I am using Python to extract the filename from a link using rfind like below:

url = "http://www.google.com/test.php"

print url[url.rfind("/") +1 : ]

This works ok with links without a / at the end of them and returns "test.php". I have encountered links with / at the end like so "http://www.google.com/test.php/". I am have trouble getting the page name when there is a "/" at the end, can anyone help?

Cheers

回答1:

Just removing the slash at the end won't work, as you can probably have a URL that looks like this:

http://www.google.com/test.php?filepath=tests/hey.xml

...in which case you'll get back "hey.xml". Instead of manually checking for this, you can use urlparse to get rid of the parameters, then do the check other people suggested:

from urlparse import urlparse
url = "http://www.google.com/test.php?something=heyharr/sir/a.txt"
f = urlparse(url)[2].rstrip("/")
print f[f.rfind("/")+1:]


回答2:

Use [r]strip to remove trailing slashes:

url.rstrip('/').rsplit('/', 1)[-1]

If a wider range of possible URLs is possible, including URLs with ?queries, #anchors or without a path, do it properly with urlparse:

path= urlparse.urlparse(url).path
return path.rstrip('/').rsplit('/', 1)[-1] or '(root path)'


回答3:

Filenames with a slash at the end are technically still path definitions and indicate that the index file is to be read. If you actually have one that' ends in test.php/, I would consider that an error. In any case, you can strip the / from the end before running your code as follows:

url = url.rstrip('/')


回答4:

There is a library called urlparse that will parse the url for you, but still doesn't remove the / at the end so one of the above will be the best option



回答5:

Just for fun, you can use a Regexp:

import re
print re.search('/([^/]+)/?$', url).group(1)


回答6:

You could use

print url[url.rstrip("/").rfind("/") +1 : ]


回答7:

filter(None, url.split('/'))[-1]

(But urlparse is probably more readable, even if more verbose.)



标签: python url