I'm trying to create a python program that logs in to my university's site using my id and password. This is the formal page for logging in: https://webapp.pucrs.br/consulta/
As you may notice, the two fields are named pr1 and pr2. The page uses POST to send the data. ALSO, there's a cookie that is downloaded when the page is loaded, it's a JSESSIONID containing a random value that, as I understood, you have to return on the header of the POST method to authenticate the login.
I wrote the following code, but the return page on the GET method says "The session was not initialized", probably cause the cookie was not sent back properly.
from urllib2 import Request, build_opener, HTTPCookieProcessor, HTTPHandler
import httplib, urllib, cookielib, Cookie, os
conn = httplib.HTTPConnection('webapp.pucrs.br')
#COOKIE FINDER
cj = cookielib.CookieJar()
opener = build_opener(HTTPCookieProcessor(cj),HTTPHandler())
req = Request('http://webapp.pucrs.br/consulta/principal.jsp')
f = opener.open(req)
html = f.read()
for cookie in cj:
c = cookie
#FIM COOKIE FINDER
params = urllib.urlencode ({'pr1':111049631, 'pr2':<pass>})
headers = {"Content-type":"text/html",
"Set-Cookie" : "JSESSIONID=70E78D6970373C07A81302C7CF800349"}
# I couldn't set the value automaticaly here, the cookie object can't be converted to string, so I change this value on every session to the new cookie's value. Any solutions?
conn.request ("POST", "/consulta/servlet/consulta.aluno.ValidaAluno",params, headers) # Validation page
resp = conn.getresponse()
temp = conn.request("GET","/consulta/servlet/consulta.aluno.Publicacoes") # desired content page
resp = conn.getresponse()
print resp.read()
Where do I put this cookie so the login is authenticated?
I recommend you to use mechanize, it automatically handles sessions/cookies/logins for you, furthermore it provides a urllib-like API and e.g. form-filling, so you don't have to mess with the right POST-request, since it gets constructed by mechanize.
urllib is no good, use requests!
I would try using the
requests
library. The documentation is excellent, and the code ends up being much cleaner than withurllib*
Using a session (see comment by Piotr) that handles cookies on its own, the result looks like this
It seems to work fine, as I get a "Usuario inexistente" notice for
pr1
123456789 and "Sehna inválida" with your user-number.You have to use the same "opener" you have created for all your requests, and it will handle the cookies all by itself.
here is an extract of something i wrote recently
Converting MatthieuW's Answer to Python 3 gives.