Determine if url is a pdf or html file

2020-03-24 05:04发布

I am requesting ulrs using the requests package in python (e.g. file = requests.get(url)). The urls do not specify an extension in them, and sometimes a html file is returned and sometimes a pdf is returned.

Is there a way of determining if the returned file is a pdf or a html? (or more generally, what the file format is). The browser is able to determine, so I assume must be indicate in the response.

标签： python-3.x python-requests

1条回答

劫难

2楼-- · 2020-03-24 05:21

This will be found in the Content-Type header, either text/html or application/pdf

 import requests

 r = requests.get('http://example.com/file')
 content_type = r.headers.get('content-type')

 if 'application/pdf' in content_type:
     ext = '.pdf'
 elif 'text/html' in content_type:
     ext = '.html'
 else:
     ext = ''
     print('Unknown type: {}'.format(content_type))

 with open('myfile'+ext, 'wb') as f:
     f.write(r.raw.read())

0人赞添加讨论(0) 举报

Determine if url is a pdf or html file

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间