Yahoo login using rvest

2019-04-02 04:53发布

问题:

Recently, Yahoo changed their authentication mechanism to a two step one. So now, when I login to a yahoo site, I put in my username, and then it asks me to open my yahoo mobile app to give it a code. Alternatively, you can have it email or text you some other way around this. The result of this is that code that used to work to programatically login to Yahoo sites no longer works. This code just redirects to the login form. I've tried with and without a useragent string and with and without the countrycode=1 in the form values. I'm fine with entering a code after looking at my mobile app, but it doesn't forward me to the page to enter that code. How do we login to Yahoo these days using R?

url <- "http://mail.yahoo.com"
uastring <- "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36"

s <- rvest::html_session(url, httr::user_agent(uastring))
s_form <- rvest::html_form(s)[[1]]
filled_form <- rvest::set_values(s_form, username="myusername", 
                                 passwd="mypassword")
out <- rvest::submit_form(session=s, filled_form, submit="signin",
                          httr::add_headers("Content-Length"=0))

回答1:

Okay, I've stumbled upon the answer here. I was using the httr::add_headers("Content-Length"=0) in response to a warning that rvest would throw: Warning message: In request_POST(session, url = url, body = request$values, encode = request$encode, : Length Required (HTTP 411).

As it turns out, despite the warning, everything worked fine and in fact, if I add the content-length header, the login fails. So, my code to login to yahoo ends up looking like this:

  username <- "some_username@yahoo.com"
  league_id <- "some league id to complete the fantasy football url"

  uastring <- "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36"
  url <- "http://football.fantasysports.yahoo.com/f1/"
  url <- paste0(url, league_id)

  s <- rvest::html_session(url, httr::user_agent(uastring))  
  myform <- rvest::html_form(s)[[1]]
  myform <- rvest::set_values(myform, username=username)
  s <- suppressWarnings(rvest::submit_form(s, myform, submit="signin"))
  s <- rvest::jump_to(s, s$response$url)
  myform <- rvest::html_form(s)[[1]]
  if("code" %in% names(myform$fields)){
    code <- readline(prompt="In your Yahoo app, find and click on the Account Key icon.\nGet the 8 character code and\nenter it here: ")
  }else{
    print("Unable to login")
    return(NULL)
  }
  myform <- rvest::set_values(myform, code=code)  
  s <- suppressWarnings(rvest::submit_form(s, myform, submit="verify"))
  if(grepl("authorize\\/verify", s$url)){
    print("Wrong code entered, unable to login")
    return(NULL)
  }else{
    print("Login successful")
  }
  s <- rvest::jump_to(s, s$response$url)

It's a two step process... Submit your username, then go to your yahoo app to get the login code. There's no yahoo password needed. I use readline to get the login code. Seems to work well... I'm able to scrape my fantasy football data after completing the login. It's just very curious that the warning asking for a content length header would lead you down a path that doesn't work. By the way, this same situation applies when trying to login to google. You have to ignore the warning and it works fine.



标签: r rvest httr