Scrape ASP.NET website with paging

2019-04-12 06:48发布

问题:

I am trying to scrape a basic asp.net directory website which has paging.

The website has more than 50 pages consisting of up to 10 paging links on any one page.

I'm using fiddler to aid in replicating all the parameters, variables, form fields, cookies etc that are posted using a browser. The only difference I see between two posts is __EVENTVALIDATION value.

Using HttpWebRequest I am always having the same value whilst via browser its changes on each click.

Using HttpWebRequest I am getting the 10 first pages correctly however all the following pages redirect me to the home page. Bellow is post back javascript which is always the same for the links after the first 10 ones.

javascript:__doPostBack('CT_Main_2$gvDirectorySearch$ctl53$ctl00$ctl11','')

Any ideas why __EVENTVALIDATION does not changes with HttpWebRequest?

回答1:

From your description, it sounds like an anti-forgery token, an anti-forgery token is used to prevent cross-site request forgery (XSRF) attacks..

For a site to take advantage of anti-forgery tokens, it will typically set a cookie in the client's browsers, and it will expect the very same token as a parameter within the form that is being posted.

To overcome it, you'll need to send the token that is set by the server on the subsequent request, you'll also need to scan the HTML form for the same token and include that as well.


EDIT

So I've dug a little deeper and created an ASP.NET WebForms site and tried to replicate your issue but couldn't... on each request I managed to extract the __EVENTVALIDATION field.

Still, here's my code if you find any of it useful...

void Main()
{
    string eventValidationToken = string.Empty;

    var firstResponse = this.Get(@"http://localhost:7428/Account/Login");

    firstResponse.FormValues["ctl00$MainContent$Email"] = "email@example.com";
    firstResponse.FormValues["ctl00$MainContent$Password"] = "password";

    string secondRequestPostdata = firstResponse.ToString();
    var secondResponse = this.Post(@"http://localhost:7428/Account/Login", secondRequestPostdata);

    Console.WriteLine (firstResponse.FormValues["__EVENTVALIDATION"]);
    Console.WriteLine (secondResponse.FormValues["__EVENTVALIDATION"]);
}


public FormData Get(string uri)
{
    HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://localhost:7428/Account/Login");
    using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
    using (Stream stream = response.GetResponseStream())
    using (StreamReader reader = new StreamReader(stream))
    {
        return  new FormData(reader.ReadToEnd());
    }
}

public FormData Post(string uri, string postContent)
{
    byte[] formBytes = Encoding.UTF8.GetBytes(postContent);

    var request = (HttpWebRequest)WebRequest.Create("http://localhost:7428/Account/Login");
    request.Method = "POST";
    request.ContentType = "application/x-www-form-urlencoded";
    request.ContentLength = formBytes.Length;

    using (Stream stream = request.GetRequestStream())
    {
        stream.Write(formBytes, 0, formBytes.Length);
    }

    using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
    using (Stream stream = response.GetResponseStream())
    using (StreamReader reader = new StreamReader(stream))
    {
        return new FormData(reader.ReadToEnd());
    }
}

public class FormData
{
    public FormData(string html)
    {
        this.Html = html;

        this.FormValues = new Dictionary<string, string>();
        this.FormValues["__EVENTTARGET"]                = this.Extract(@"__EVENTTARGET");
        this.FormValues["__EVENTARGUMENT"]              = this.Extract(@"__EVENTARGUMENT");
        this.FormValues["__VIEWSTATE"]                  = this.Extract(@"__VIEWSTATE");
        this.FormValues["__VIEWSTATEGENERATOR"]         = this.Extract(@"__VIEWSTATEGENERATOR");
        this.FormValues["__EVENTVALIDATION"]            = this.Extract(@"__EVENTVALIDATION");
        this.FormValues["ctl00$MainContent$Email"]      = string.Empty;
        this.FormValues["ctl00$MainContent$Password"]   = string.Empty;
        this.FormValues["ctl00$MainContent$ctl05"]      = "Log in";
    }

    public string Html { get; set; }

    private string Extract(string id)
    {
        return Regex.Match(this.Html, @"id=""" + id + @""" value=""([^""]*)")
                    .Groups[1]
                    .Value;
    }

    public Dictionary<string, string> FormValues { get;set; }

    public override string ToString()
    {
        var formData = this.FormValues.Select(form => HttpUtility.UrlEncode(form.Key) + "=" + HttpUtility.UrlEncode(form.Value));

        return string.Join("&", formData);
    }
}