-->

Flurry scraping using python3 requests.Session()

2019-07-23 06:12发布

问题:

This seems really straight forward, but for some reason this isn't connecting to flurry correctly and I unable to scrape the data.

    loginurl = "https://dev.flurry.com/secure/loginPage.do"
    csvurl = "https://dev.flurry.com/eventdata"

    session = requests.Session()
    login = session.post(loginurl, data={'loginEmail': 'user', 'loginPassword': 'pass'})
    data = session.get(csvurl)

Every time I try to use this, I get redirected back to the login screen (loginurl) without fetching the new data. Has anyone been able to connect to flurry like this successfully before?

Any and all help would be greatly appreciated, thanks.

回答1:

There are two more form fields to be populated struts.token.name and the value from struts.token.name i.e token, you also have to post to loginAction.do:

You can do an initial get and parse the values using bs4 then post the data:

from bs4 import BeautifulSoup
import requests 

loginurl = "https://dev.flurry.com/secure/loginAction.do"
csvurl = "https://dev.flurry.com/eventdata"#
data = {'loginEmail': 'user', 'loginPassword': 'pass'}

with requests.Session() as session:
    session.headers.update({
        "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36"})

    soup = BeautifulSoup(session.get(loginurl).content)
    name = soup.select_one("input[name=struts.token.name]")["value"]
    data["struts.token.name"] = name
    data[name] = soup.select_one("input[name={}]".format(name))["value"]
    login = session.post(loginurl, data=data)