How to extract dynamic ajax content from a web pag

2020-04-10 02:41发布

问题:

My requirement is to extract the required content from a web page. The page has a section which is being populated using ajax. When i view in page source it is not showing the content loaded using ajax. The section content will change based on check box selected. If we select 'India' check box then the section will display all the details of India. The page source will show only default content not the content displayed using ajax. I checked the page source after selecting the check box, still it shows only default value. How to get that section content,

回答1:

In C# you can use HTMLAgilityPack to craw data, but if you use webBrowser.DocumentText, you can't load ajax content from webpage to get xpath. So after webBrowser control loaded webpage completely. In Document_Complete method you add some codes below:

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
this.webBrowser1.Document;
IHTMLDocument2 currentDoc =(IHTMLDocument2)this.webBrowser1.Document.DomDocument;

doc.LoadHtml(currentDoc.activeElement.innerHTML);


回答2:

Use Firebug under Firefox. Under NET tab you will see the extra content loaded.