When I access a page on an IIS server to retrieve xml, using a query parameter through the browser (using the http in the below example) I get a pop-up login dialog for username and password (appears to be a system standard dialog/form). and once submitted the data arrives. as an xml page.
How do I handle this with urllib? when I do the following, I never get prompted for a uid/psw.. I just get a traceback indicating the server (correctly ) id's me as not authorized. Using python 2.7 in Ipython notebook
f = urllib.urlopen("http://www.nalmls.com/SERetsHuntsville/Search.aspx?SearchType=Property&Class=RES&StandardNames=0&Format=COMPACT&Query=(DATE_MODIFIED=2012-09-28T00:00:00%2B)&Limit=10")
s = f.read()
f.close()
Pointers to doc also appreciated! did not find this exact use case.
I plan to parse the xml to csv if that makes a difference.
You are dealing with http authentication. I've always found it tricky to get working quickly with the urllib library. The requests python package makes it super simple.
url = "http://www.nalmls.com/SERetsHuntsville/Search.aspx?SearchType=Property&Class=RES&StandardNames=0&Format=COMPACT&Query=(DATE_MODIFIED=2012-09-28T00:00:00%2B)&Limit=10"
r = requests.get(url, auth=('user', 'pass'))
page = r.text
If you look at the headers for that url you can see that it is using digest authentication:
{'content-length': '1893', 'x-powered-by': 'ASP.NET',
'x-aspnet-version': '4.0.30319', 'server': 'Microsoft-IIS/7.5',
'cache-control': 'private', 'date': 'Fri, 05 Oct 2012 18:20:54 GMT',
'content-type': 'text/html; charset=utf-8', 'www-authenticate':
'Digest realm="Solid Earth", nonce="MTAvNS8yMDEyIDE6MjE6MjUgUE0",
opaque="0000000000000000", stale=false, algorithm=MD5, qop="auth"'}
So you will need:
from requests.auth import HTTPDigestAuth
r = requests.get(url, auth=HTTPDigestAuth('user', 'pass'))
There are many ways to do it but i suggest you start with urllib2 and it's batteries included.
import urllib2, base64
req = urllib2.Request("http://webpage.com//user")
b64str = base64.encodestring('%s:%s' % (username, password)).replace('\n', '')
request.add_header("Authorization", "Basic %s" % b64str)
result = urllib2.urlopen(req)
You can use requests, beautifulsoup,mechanize or selenium if your task gets harder. Googling will give you enough examples for each one of these,
This can be done in a couple of ways:
- Use
urllib
/urllib2
and requests
as others have suggested
- Use
Mechanize
to simulate manual form-filling and get back the response