I'm trying to create a python program that logs in to my university's site using my id and password. This is the formal page for logging in: https://webapp.pucrs.br/consulta/
As you may notice, the two fields are named pr1 and pr2. The page uses POST to send the data. ALSO, there's a cookie that is downloaded when the page is loaded, it's a JSESSIONID containing a random value that, as I understood, you have to return on the header of the POST method to authenticate the login.
I wrote the following code, but the return page on the GET method says "The session was not initialized", probably cause the cookie was not sent back properly.
from urllib2 import Request, build_opener, HTTPCookieProcessor, HTTPHandler
import httplib, urllib, cookielib, Cookie, os
conn = httplib.HTTPConnection('webapp.pucrs.br')
#COOKIE FINDER
cj = cookielib.CookieJar()
opener = build_opener(HTTPCookieProcessor(cj),HTTPHandler())
req = Request('http://webapp.pucrs.br/consulta/principal.jsp')
f = opener.open(req)
html = f.read()
for cookie in cj:
c = cookie
#FIM COOKIE FINDER
params = urllib.urlencode ({'pr1':111049631, 'pr2':<pass>})
headers = {"Content-type":"text/html",
"Set-Cookie" : "JSESSIONID=70E78D6970373C07A81302C7CF800349"}
# I couldn't set the value automaticaly here, the cookie object can't be converted to string, so I change this value on every session to the new cookie's value. Any solutions?
conn.request ("POST", "/consulta/servlet/consulta.aluno.ValidaAluno",params, headers) # Validation page
resp = conn.getresponse()
temp = conn.request("GET","/consulta/servlet/consulta.aluno.Publicacoes") # desired content page
resp = conn.getresponse()
print resp.read()
Where do I put this cookie so the login is authenticated?
I would try using the requests
library. The documentation is excellent, and the code ends up being much cleaner than with urllib*
$ pip install requests
Using a session (see comment by Piotr) that handles cookies on its own, the result looks like this
import requests
url_0 = "http://webapp.pucrs.br/consulta/principal.jsp"
url = "https://webapp.pucrs.br/consulta/servlet/consulta.aluno.ValidaAluno"
data = {"pr1": "123456789", "pr2": "1234"}
s = requests.session()
s.get(url_0)
r = s.post(url, data)
It seems to work fine, as I get a "Usuario inexistente" notice for pr1
123456789 and "Sehna inválida" with your user-number.
You have to use the same "opener" you have created for all your requests, and it will handle the cookies all by itself.
here is an extract of something i wrote recently
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookielib.CookieJar()))
# then for all requests
if postData:
pData = urllib.urlencode(postData)
else:
pData = None
httpReq = urllib2.Request(url, pData, self._headers)
page = opener.open(httpReq)
Converting MatthieuW's Answer to Python 3 gives.
import urllib, http.cookiejar
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(http.cookiejar.CookieJar()))
# then for all requests
if postData:
pData = urllib.parse.urlencode(postData)
else:
pData = None
httpReq = urllib.request.Request(url, pData)
page = opener.open(httpReq)
I recommend you to use mechanize, it automatically handles sessions/cookies/logins for you, furthermore it provides a urllib-like API and e.g. form-filling, so you don't have to mess with the right POST-request, since it gets constructed by mechanize.
urllib is no good, use requests!
from requests import Request, Session
url = "https://webapp.pucrs.br/consulta/principal.jsp"
s = requests.Session()
p = dict(pb1 = 'dd', pb2 = 'cc')
r = s.get(url, params = p)
# use the cert=/path/to/certificate.pem if you need it
# elsewhere use verify = False to bypass ssl verification
c = r.cookies
# Then send back a response using those same cookies
r = requests.get(other_url, cookies = c, verify = False)