My requirement is to extract the required content from a web page. The page has a section which is being populated using ajax. When i view in page source it is not showing the content loaded using ajax. The section content will change based on check box selected. If we select 'India' check box then the section will display all the details of India. The page source will show only default content not the content displayed using ajax. I checked the page source after selecting the check box, still it shows only default value. How to get that section content,
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
In C# you can use HTMLAgilityPack to craw data, but if you use webBrowser.DocumentText, you can't load ajax content from webpage to get xpath. So after webBrowser control loaded webpage completely. In Document_Complete method you add some codes below:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
this.webBrowser1.Document;
IHTMLDocument2 currentDoc =(IHTMLDocument2)this.webBrowser1.Document.DomDocument;
doc.LoadHtml(currentDoc.activeElement.innerHTML);
回答2:
Use Firebug under Firefox. Under NET tab you will see the extra content loaded.