urllib2 file name

If I open a file using urllib2, like so:

remotefile = urllib2.urlopen('http://example.com/somefile.zip')

Is there an easy way to get the file name other then parsing the original URL?

EDIT: changed openfile to urlopen... not sure how that happened.

EDIT2: I ended up using:

filename = url.split('/')[-1].split('#')[0].split('?')[0]

Unless I'm mistaken, this should strip out all potential queries as well.

标签： python url urllib2

14条回答

SAY GOODBYE

2楼-- · 2019-01-08 12:54

Using urlsplit is the safest option:

url = 'http://example.com/somefile.zip'
urlparse.urlsplit(url).path.split('/')[-1]

0人赞添加讨论(0) 举报

做自己的国王

3楼-- · 2019-01-08 12:56

I guess it depends what you mean by parsing. There is no way to get the filename without parsing the URL, i.e. the remote server doesn't give you a filename. However, you don't have to do much yourself, there's the urlparse module:

In [9]: urlparse.urlparse('http://example.com/somefile.zip')
Out[9]: ('http', 'example.com', '/somefile.zip', '', '', '')

0人赞添加讨论(0) 举报

聊天终结者

4楼-- · 2019-01-08 12:57

Do you mean urllib2.urlopen? There is no function called openfile in the urllib2 module.

Anyway, use the urllib2.urlparse functions:

>>> from urllib2 import urlparse
>>> print urlparse.urlsplit('http://example.com/somefile.zip')
('http', 'example.com', '/somefile.zip', '', '')

Voila.

0人赞添加讨论(0) 举报

兄弟一词,经得起流年.

5楼-- · 2019-01-08 12:57

Using PurePosixPath which is not operating system—dependent and handles urls gracefully is the pythonic solution:

>>> from pathlib import PurePosixPath
>>> path = PurePosixPath('http://example.com/somefile.zip')
>>> path.name
'somefile.zip'
>>> path = PurePosixPath('http://example.com/nested/somefile.zip')
>>> path.name
'somefile.zip'

Notice how there is no network traffic here or anything (i.e. those urls don't go anywhere) - just using standard parsing rules.

0人赞添加讨论(0) 举报

forever°为你锁心

6楼-- · 2019-01-08 13:01

import os,urllib2
resp = urllib2.urlopen('http://www.example.com/index.html')
my_url = resp.geturl()

os.path.split(my_url)[1]

# 'index.html'

This is not openfile, but maybe still helps :)

0人赞添加讨论(0) 举报

看我几分像从前

7楼-- · 2019-01-08 13:05

I think that "the file name" isn't a very well defined concept when it comes to http transfers. The server might (but is not required to) provide one as "content-disposition" header, you can try to get that with remotefile.headers['Content-Disposition']. If this fails, you probably have to parse the URI yourself.

0人赞添加讨论(0) 举报

1 2 3 下一页

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间