Why is my urlFetchApp function failing to successf

2019-02-19 00:21发布

I'm trying to use google apps script to login to an ASP.Net website and scrape some data that I typically have to retrieve manually. I've used Chrome Developer tools to get the correct payload names (TEXT_Username, TEXT_Password, _VIEWSTATE, _VIEWSTATEGENERATOR), I also got a ASP Net session Id to send along with my Post request.

When I run my function(s) it returns a Response Code = 200 if followRedirects is set to false and returns Response Code = 302 if followRedirects is set to true. Unfortunately in neither case do the functions successfully authenticate the website. Instead the HTML returned is that of the Login Page.

I've tried different header variants and parameters, but I can't seem to successfully login.

Couple of other points. When I do the login in Chrome using the Developer tools, the response code appears to be 302 Found.

Does anyone have any suggestions on how I can successfully login to this site. Do you see any errors in my functions that could be the cause of my problems. I'm open to any and all suggestions.

My GAS functions follow:

    function login(cookie, viewState,viewStateGenerator) {
    var payload =
       {
         "__VIEWSTATE" : viewState,
         "__VIEWSTATEGENERATOR" : viewStateGenerator,
         "TEXT_Username" : "myUserName",
         "TEXT_Password" : "myPassword",
       };
    var header = {'Cookie':cookie};
    Logger.log(header);  
      var options =
       {
         "method" : "post",
         "payload" : payload,
         "followRedirects" : false,
         "headers" : header
       };
      var browser = UrlFetchApp.fetch("http://tnetwork.trakus.com/tnet/Login.aspx?" , options);
      Utilities.sleep(1000);
      var html = browser.getContentText();
      var response = browser.getResponseCode();
      var cookie2 = browser.getAllHeaders()['Set-Cookie'];
      Logger.log(response);
      Logger.log(html);

      }

    function loginPage() {
      var options =
       {
         "method" : "get",
         "followRedirects" : false,
       };
      var browser = UrlFetchApp.fetch("http://tnetwork.trakus.com/tnet/Login.aspx?" , options);
      var html = browser.getContentText();
     // Utilities.sleep(500);
      var response = browser.getResponseCode();
      var cookie = browser.getAllHeaders()['Set-Cookie'];
      login(cookie);
       var regExpGen = new RegExp("<input type=\"hidden\" name=\"__VIEWSTATEGENERATOR\" id=\"__VIEWSTATEGENERATOR\" value=\"(.*)\" \/>");
     var viewStateGenerator = regExpGen.exec(html)[1];
     var regExpView = new RegExp("<input type=\"hidden\" name=\"__VIEWSTATE\" id=\"__VIEWSTATE\" value=\"(.*)\" \/>");
    var viewState = regExpView.exec(html)[1];
    var response = login(cookie,viewState,viewStateGenerator);
  return response
      }

I call the script by running the loginPage() function. This function obtains the cookie (session id) and then calls the login function and passes along the session id (cookie).

Here is what I see in the Google Developer tools Network section when I login using Google's Chrome browser:

    Remote Address: 66.92.89.141:80
    Request URL: http://tnetwork.trakus.com/tnet/Login.aspx
    Request Method: POST
    Status Code:302 Found

    **Request Headers** view source
      Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
    Accept-Encoding:gzip, deflate
    Accept-Language: en-US,en;q=0.8
    Cache-Control:max-age=0
    Connection:keep-alive
    Content-Length: 252
    Content-Type:application/x-www-form-urlencoded
    Cookie: ASP.NET_SessionId=jayaejut5hopr43xkp0vhzu4; userCredentials=username=myUsername; .ASPXAUTH=A54B65A54A850901437E07D8C6856B7799CAF84C1880EEC530074509ADCF40456FE04EC9A4E47D1D359C1645006B29C8A0A7D2198AA1E225C636E7DC24C9DA46072DE003EFC24B9FF2941755F2F290DC1037BB2B289241A0E30AF5CB736E6E1A7AF52630D8B31318A36A4017893452B29216DCF2; __utma=260442568.1595796669.1421539534.1425211879.1425214489.16; __utmc=260442568; __utmz=260442568.1421539534.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utma=190106350.1735963725.1421539540.1425152706.1425212185.18; __utmc=190106350; __utmz=190106350.1421539540.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
    Host:tnetwork.trakus.com
    Origin:http://tnetwork.trakus.com
    Referer:http://tnetwork.trakus.com/tnet/Login.aspx?
    User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36

    **Form Dataview** sourceview URL encoded
__VIEWSTATE: O7YCnq5e471jHLqfPre/YW+dxYxyhoQ/VetOBeA1hqMubTAAUfn+j9HDyVeEgfAdHMl+2DG/9Gw2vAGWYvU97gml5OXiR9E/9ReDaw9EaQg836nBvMMIjE4lVfU=
    __VIEWSTATEGENERATOR:F4425990
    TEXT_Username:myUsername
    TEXT_Password:myPassword
    BUTTON_Submit: Log In

Update: It appears that the website is using an HttpOnly cookie. As a result, I don't think I am capturing the whole cookie and therefore my header is not correct. As a result, I believe I need to set followRedirects to false and handle the redirect and cookie manually. I'm currently researching this process, but welcome input from anyone who has been down this road.

2条回答
疯言疯语
2楼-- · 2019-02-19 00:54

I notice that the provided Chrome payload includes BUTTON_Submit: Log In but your POST payload does not. I have found that for POSTs in GAS things go much more smoothly if I explicitly set a submit variable in my payload objects. In any case, if you're trying to emulate what Chrome is doing, this is a good first step.

So in your case, it's a one line change:

var payload =
   {
     "__VIEWSTATE" : viewState,
     "__VIEWSTATEGENERATOR" : viewStateGenerator,
     "TEXT_Username" : "myUserName",
     "TEXT_Password" : "myPassword",
     "BUTTON_Submit" : "Log In"
   };
查看更多
We Are One
3楼-- · 2019-02-19 00:58

I was finally able to successfully login to the page. The issue seems to be that the urlFetchApp was unable to follow the redirect. I credit this stackoverflow post: how to fetch a wordpress admin page using google apps script

This post described the following process that led to my successful login:

  1. Set followRedirect to false
  2. Submit the post and capture the cookies
  3. Use the captured cookie to issue a get with the appropriate url.

Here is the relevant code:

var url = "http://myUrl.com/;
   var options = {
      "method": "post",
      "payload": {
      "TEXT_Username" : "myUserName",
      "TEXT_Password" : "myPassword",
      "BUTTON_Submit" : "Log In",
      },
      "testcookie": 1,
      "followRedirects": false
   };
   var response = UrlFetchApp.fetch(url, options);
   if ( response.getResponseCode() == 200 ) {
     // Incorrect user/pass combo
   } else if ( response.getResponseCode() == 302 ) {
     // Logged-in
     var headers = response.getAllHeaders();
     if ( typeof headers['Set-Cookie'] !== 'undefined' ) {
        // Make sure that we are working with an array of cookies
        var cookies = typeof headers['Set-Cookie'] == 'string' ? [ headers['Set-Cookie'] ] : headers['Set-Cookie'];
        for (var i = 0; i < cookies.length; i++) {
           // We only need the cookie's value - it might have path, expiry time, etc here
           cookies[i] = cookies[i].split( ';' )[0];  
        };

        url = "http://myUrl/Calendar.aspx";
        options = {
            "method": "get",
            // Set the cookies so that we appear logged-in
            "headers": {
               "Cookie": cookies.join(';') 
            }
        }
      ...
查看更多
登录 后发表回答