Sorry that the title wasn't very clear, basically I have a list with a whole series of url's, with the intention of downloading the ones that are pictures. Is there anyway to check if the webpage is an image, so that I can just skip over the ones that arent?
Thanks in advance
You can use requests module. Make a head request and check the content type. Head request will not download the response body.
import requests
response = requests.head(url)
print response.headers.get('content-type')
There is no reliable way. But you could find a solution that might be "good enough" in your case.
You could look at the file extension if it is present in the url e.g., .png
, .jpg
could indicate an image:
>>> import os
>>> name = url2filename('')
>>> os.path.splitext(name)[1]
>>> import mimetypes
>>> mimetypes.guess_type(name)[0]
where url2filename()
function is defined here.
You could inspect Content-Type
http header:
>>> import urllib.request
>>> r = urllib.request.urlopen(url) # make HTTP GET request, read headers
>>> r.headers.get_content_type()
>>> r.headers.get_content_maintype()
>>> r.headers.get_content_subtype()
You could check the very beginning of the http body for magic numbers indicating image files e.g., jpeg may start with b'\xff\xd8\xff\xe0'
>>> prefix =
>>> prefix # .png image
As @pafcu suggested in the answer to the related question, you could use imghdr.what()
>>> import imghdr
>>> imghdr.what(None, b'\x89PNG\r\n\x1a\n')
You can use mimetypes
import urllib
from mimetypes import guess_extension
source = urllib.urlopen(url)
extension = guess_extension(['Content-Type'])
print extension
this will return "png"