In python, how would I check if a url ending in .jpg exists?
ex: http://www.fakedomain.com/fakeImage.jpg
thanks
In python, how would I check if a url ending in .jpg exists?
ex: http://www.fakedomain.com/fakeImage.jpg
thanks
>>> import httplib
>>>
>>> def exists(site, path):
... conn = httplib.HTTPConnection(site)
... conn.request('HEAD', path)
... response = conn.getresponse()
... conn.close()
... return response.status == 200
...
>>> exists('http://www.fakedomain.com', '/fakeImage.jpg')
False
If the status is anything other than a 200, the resource doesn't exist at the URL. This doesn't mean that it's gone altogether. If the server returns a 301 or 302, this means that the resource still exists, but at a different URL. To alter the function to handle this case, the status check line just needs to be changed to return response.status in (200, 301, 302)
.
The code below is equivalent to tikiboy's answer, but using a high-level and easy-to-use requests library.
import requests
def exists(path):
r = requests.head(path)
return r.status_code == requests.codes.ok
print exists('http://www.fakedomain.com/fakeImage.jpg')
The requests.codes.ok
equals 200
, so you can substitute the exact status code if you wish.
requests.head
may throw an exception if server doesn't respond, so you might want to add a try-except construct.
Also if you want to include codes 301
and 302
, consider code 303
too, especially if you dereference URIs that denote resources in Linked Data. A URI may represent a person, but you can't download a person, so the server will redirect you to a page that describes this person using 303 redirect.
thanks for all the responses everyone, ended up using the following:
try:
f = urllib2.urlopen(urllib2.Request(url))
deadLinkFound = False
except:
deadLinkFound = True
Looks like http://www.fakedomain.com/fakeImage.jpg
automatically redirected to http://www.fakedomain.com/index.html
without any error.
Redirecting for 301 and 302 responses are automatically done without giving any response back to user.
Please take a look HTTPRedirectHandler, you might need to subclass it to handle that.
Here is the one sample from Dive Into Python:
http://diveintopython3.ep.io/http-web-services.html#redirects
There are problems with the previous answers when the file is in ftp server (ftp://url.com/file), the following code works when the file is in ftp, http or https:
import urllib2
def file_exists(url):
request = urllib2.Request(url)
request.get_method = lambda : 'HEAD'
try:
response = urllib2.urlopen(request)
return True
except:
return False
Try it with mechanize:
import mechanize
br = mechanize.Browser()
br.set_handle_redirect(False)
try:
br.open_novisit('http://www.fakedomain.com/fakeImage.jpg')
print 'OK'
except:
print 'KO'
I think you can try send a http request to the url and read the response.If no exception was caught,it probably exists.
I don't know why you are doing this, but in any case: it should be noted that just because a request to an "image" succeeds, doesn't mean it is what you think it is (it could redirect to anything, or return any data of any type, and potentially cause problems depending on what you do with the response).
Sorry, I went on a binge reading about online exploits and how to defend against them today :P
This might be good enough to see if a url to a file exists.
import urllib
if urllib.urlopen('http://www.fakedomain.com/fakeImage.jpg').code == 200:
print 'File exists'
in Python 3.6.5:
import http.client
def exists(site, path):
connection = http.client.HTTPConnection(site)
connection.request('HEAD', path)
response = connection.getresponse()
connection.close()
return response.status == 200
exists("www.fakedomain.com", "/fakeImage.jpg")
In Python 3, the module httplib
has been renamed to http.client
And you need remove the http://
and https://
from your URL, because the httplib
is considering :
as a port number and the port number must be numeric.