HTTPWebRequest waits for content to load by Ajax

2019-07-15 04:31发布

问题:

I am trying to use httpWebRequest to get a complete web page, but the response that I get is not a complete web page because part of the web page is loaded by AJAX and this part takes a while (usually 10 - 30 seconds to load). Is there a way that I can set/force httpWebRequest to wait for a number of seconds before retrieving the content of a web page?

Any help would be greatly appreciated!

Thanks

回答1:

If the site is using AJAX to load data, then using HttpWebRequest might not work. The reason is that the site is probably using the document.onLoad() or page.onLoad() method to issue a GET request to the remote site. And this happens inside the javascript execution in the browser.

If you want to get this to work, you have two options.

1) Issue the request for the container page (the main page that you access), and a second request to the contained page that is being loaded by the container page using AJAX. In order to find the contained page, you will have to use Firefox with the firebug plugin to figure out the URL of the inner page.

2) Use some higher level frameworks that support Javascript and HTML/DOM. For eg, you could try using the WebBrowser control from microsoft, hosted in .NET. Or you could use other frameworks, provided the framework supports javascript and understands HTML.



回答2:

Below is my code

HttpWebRequest httpWebRequest = null;
HttpWebResponse httpWebResponse = null;
Stream webResponseStream;
StreamReader streamReader;
string html = "";

try
{
    UTF8Encoding encoding = new UTF8Encoding();
    byte[] data = encoding.GetBytes(postData);
    httpWebRequest = (HttpWebRequest)WebRequest.Create(url);
    httpWebRequest.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 (.NET CLR 3.5.30729)";
    httpWebRequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
    httpWebRequest.ContentType = "application/x-www-form-urlencoded";

    httpWebRequest.KeepAlive = false;
    httpWebRequest.Method = WebRequestMethods.Http.Post;

    httpWebRequest.AllowAutoRedirect = true;
    httpWebRequest.Headers.Add("Accept-Language", "en-us");
    httpWebRequest.ContentLength = data.Length;


    Stream dataStream = httpWebRequest.GetRequestStream();
    dataStream.Write(data, 0, data.Length);
    dataStream.Close();

    httpWebResponse = (HttpWebResponse)httpWebRequest.GetResponse();

    webResponseStream = httpWebResponse.GetResponseStream();

    // Reading the webResponseStream with streamReader object and assigning into one string
    streamReader = new StreamReader(webResponseStream);

    html = streamReader.ReadToEnd();

}
catch{

}

return html;


回答3:

why not go with some thign simple like this:

 WebClient w = new WebClient();
 string pageSource = w.DownloadString(URL);

try this and if it works then add all the rest of your attributes to the WebClient object