可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm trying to create a python program that logs in to my university's site using my id and password. This is the formal page for logging in: https://webapp.pucrs.br/consulta/

As you may notice, the two fields are named pr1 and pr2. The page uses POST to send the data. ALSO, there's a cookie that is downloaded when the page is loaded, it's a JSESSIONID containing a random value that, as I understood, you have to return on the header of the POST method to authenticate the login.

I wrote the following code, but the return page on the GET method says "The session was not initialized", probably cause the cookie was not sent back properly.

from urllib2 import Request, build_opener, HTTPCookieProcessor, HTTPHandler
import httplib, urllib, cookielib, Cookie, os

conn = httplib.HTTPConnection('webapp.pucrs.br')

#COOKIE FINDER
cj = cookielib.CookieJar()
opener = build_opener(HTTPCookieProcessor(cj),HTTPHandler())
req = Request('http://webapp.pucrs.br/consulta/principal.jsp')
f = opener.open(req)
html = f.read()
for cookie in cj:
    c = cookie
#FIM COOKIE FINDER

params = urllib.urlencode ({'pr1':111049631, 'pr2':<pass>})
headers = {"Content-type":"text/html",
           "Set-Cookie" : "JSESSIONID=70E78D6970373C07A81302C7CF800349"}
            # I couldn't set the value automaticaly here, the cookie object can't be converted to string, so I change this value on every session to the new cookie's value. Any solutions?

conn.request ("POST", "/consulta/servlet/consulta.aluno.ValidaAluno",params, headers) # Validation page
resp = conn.getresponse()

temp = conn.request("GET","/consulta/servlet/consulta.aluno.Publicacoes") # desired content page
resp = conn.getresponse()

print resp.read()

Where do I put this cookie so the login is authenticated?

回答1:

I would try using the requests library. The documentation is excellent, and the code ends up being much cleaner than with urllib*

$ pip install requests

Using a session (see comment by Piotr) that handles cookies on its own, the result looks like this

import requests
url_0 = "http://webapp.pucrs.br/consulta/principal.jsp"
url = "https://webapp.pucrs.br/consulta/servlet/consulta.aluno.ValidaAluno"
data = {"pr1": "123456789", "pr2": "1234"}

s = requests.session()
s.get(url_0)
r = s.post(url, data)

It seems to work fine, as I get a "Usuario inexistente" notice for pr1 123456789 and "Sehna inválida" with your user-number.

回答2:

You have to use the same "opener" you have created for all your requests, and it will handle the cookies all by itself.

here is an extract of something i wrote recently

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookielib.CookieJar()))

# then for all requests

if postData:     
    pData =  urllib.urlencode(postData)
else:
    pData = None

httpReq = urllib2.Request(url, pData, self._headers)
page =  opener.open(httpReq)

回答3:

Converting MatthieuW's Answer to Python 3 gives.

import urllib, http.cookiejar

opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(http.cookiejar.CookieJar()))
# then for all requests

if postData:     
    pData =  urllib.parse.urlencode(postData)
else:
    pData = None

httpReq = urllib.request.Request(url, pData)
page =  opener.open(httpReq)

回答4:

I recommend you to use mechanize, it automatically handles sessions/cookies/logins for you, furthermore it provides a urllib-like API and e.g. form-filling, so you don't have to mess with the right POST-request, since it gets constructed by mechanize.

回答5:

urllib is no good, use requests!

from requests import Request, Session

url = "https://webapp.pucrs.br/consulta/principal.jsp"
s = requests.Session()

p = dict(pb1 = 'dd', pb2 = 'cc')
r = s.get(url, params = p) 
# use the cert=/path/to/certificate.pem if you need it
# elsewhere use verify = False to bypass ssl verification

c = r.cookies

# Then send back a response using those same cookies

r = requests.get(other_url, cookies = c, verify = False)