urllib2 file name

2019-01-08 12:41发布

If I open a file using urllib2, like so:

remotefile = urllib2.urlopen('http://example.com/somefile.zip')

Is there an easy way to get the file name other then parsing the original URL?

EDIT: changed openfile to urlopen... not sure how that happened.

EDIT2: I ended up using:

filename = url.split('/')[-1].split('#')[0].split('?')[0]

Unless I'm mistaken, this should strip out all potential queries as well.

14条回答
SAY GOODBYE
2楼-- · 2019-01-08 12:54

Using urlsplit is the safest option:

url = 'http://example.com/somefile.zip'
urlparse.urlsplit(url).path.split('/')[-1]
查看更多
做自己的国王
3楼-- · 2019-01-08 12:56

I guess it depends what you mean by parsing. There is no way to get the filename without parsing the URL, i.e. the remote server doesn't give you a filename. However, you don't have to do much yourself, there's the urlparse module:

In [9]: urlparse.urlparse('http://example.com/somefile.zip')
Out[9]: ('http', 'example.com', '/somefile.zip', '', '', '')
查看更多
聊天终结者
4楼-- · 2019-01-08 12:57

Do you mean urllib2.urlopen? There is no function called openfile in the urllib2 module.

Anyway, use the urllib2.urlparse functions:

>>> from urllib2 import urlparse
>>> print urlparse.urlsplit('http://example.com/somefile.zip')
('http', 'example.com', '/somefile.zip', '', '')

Voila.

查看更多
兄弟一词,经得起流年.
5楼-- · 2019-01-08 12:57

Using PurePosixPath which is not operating system—dependent and handles urls gracefully is the pythonic solution:

>>> from pathlib import PurePosixPath
>>> path = PurePosixPath('http://example.com/somefile.zip')
>>> path.name
'somefile.zip'
>>> path = PurePosixPath('http://example.com/nested/somefile.zip')
>>> path.name
'somefile.zip'

Notice how there is no network traffic here or anything (i.e. those urls don't go anywhere) - just using standard parsing rules.

查看更多
forever°为你锁心
6楼-- · 2019-01-08 13:01
import os,urllib2
resp = urllib2.urlopen('http://www.example.com/index.html')
my_url = resp.geturl()

os.path.split(my_url)[1]

# 'index.html'

This is not openfile, but maybe still helps :)

查看更多
看我几分像从前
7楼-- · 2019-01-08 13:05

I think that "the file name" isn't a very well defined concept when it comes to http transfers. The server might (but is not required to) provide one as "content-disposition" header, you can try to get that with remotefile.headers['Content-Disposition']. If this fails, you probably have to parse the URI yourself.

查看更多
登录 后发表回答