I'm trying to write a script to test for the existence of a web page, would be nice if it would check without downloading the whole page.
This is my jumping off point, I've seen multiple examples use httplib in the same way, however, every site I check simply returns false.
import httplib
from httplib import HTTP
from urlparse import urlparse
def checkUrl(url):
p = urlparse(url)
h = HTTP(p[1])
h.putrequest('HEAD', p[2])
h.endheaders()
return h.getreply()[0] == httplib.OK
if __name__=="__main__":
print checkUrl("http://www.stackoverflow.com") # True
print checkUrl("http://stackoverflow.com/notarealpage.html") # False
Any ideas?
Edit
Someone suggested this, but their post was deleted.. does urllib2 avoid downloading the whole page?
import urllib2
try:
urllib2.urlopen(some_url)
return True
except urllib2.URLError:
return False
how about this:
import httplib
from urlparse import urlparse
def checkUrl(url):
p = urlparse(url)
conn = httplib.HTTPConnection(p.netloc)
conn.request('HEAD', p.path)
resp = conn.getresponse()
return resp.status < 400
if __name__ == '__main__':
print checkUrl('http://www.stackoverflow.com') # True
print checkUrl('http://stackoverflow.com/notarealpage.html') # False
this will send an HTTP HEAD request and return True if the response status code is < 400.
- notice that StackOverflow's root path returns a redirect (301), not a 200 OK.
Using requests
, this is as simple as:
import requests
ret = requests.head('http://www.example.com')
print(ret.status_code)
This just loads the website's header. To test if this was successfull, you can check the results status_code
. Or use the raise_for_status
method which raises an Exception
if the connection was not succesfull.
How about this.
import requests
def url_check(url):
#Description
"""Boolean return - check to see if the site exists.
This function takes a url as input and then it requests the site
head - not the full html and then it checks the response to see if
it's less than 400. If it is less than 400 it will return TRUE
else it will return False.
"""
try:
site_ping = requests.head(url)
if site_ping.status_code < 400:
# To view the return status code, type this : **print(site.ping.status_code)**
return True
else:
return False
except Exception:
return False
You can try
import urllib2
try:
urllib2.urlopen(url='https://someURL')
except:
print("page not found")