If I open a file using urllib2, like so:
remotefile = urllib2.urlopen('http://example.com/somefile.zip')
Is there an easy way to get the file name other then parsing the original URL?
EDIT: changed openfile to urlopen... not sure how that happened.
EDIT2: I ended up using:
filename = url.split('/')[-1].split('#')[0].split('?')[0]
Unless I'm mistaken, this should strip out all potential queries as well.
If you only want the file name itself, assuming that there's no query variables at the end like http://example.com/somedir/somefile.zip?foo=bar then you can use os.path.basename for this:
Some other posters mentioned using urlparse, which will work, but you'd still need to strip the leading directory from the file name. If you use os.path.basename() then you don't have to worry about that, since it returns only the final part of the URL or file path.
You could also combine both of the two best-rated answers : Using urllib2.urlparse.urlsplit() to get the path part of the URL, and then os.path.basename for the actual file name.
Full code would be :
The
os.path.basename
function works not only for file paths, but also for urls, so you don't have to manually parse the URL yourself. Also, it's important to note that you should useresult.url
instead of the original url in order to follow redirect responses:Did you mean urllib2.urlopen?
You could potentially lift the intended filename if the server was sending a Content-Disposition header by checking
remotefile.info()['Content-Disposition']
, but as it is I think you'll just have to parse the url.You could use
urlparse.urlsplit
, but if you have any URLs like at the second example, you'll end up having to pull the file name out yourself anyway:Might as well just do this:
You probably can use simple regular expression here. Something like:
Just saw this I normally do..