C# WPF Webbrowser msHTML - Explore DOM - Find Elem

2019-07-12 01:25发布

问题:

I'm actually working on a personal project in C# using WPF and WPF WebBrowser. I really need to explore html DOM Elements as we used to do in javascript or php..etc

In my MainWindow I have this variable :

private mshtml.HTMLDocument mainDocument = new mshtml.HTMLDocument();

In my webBrowser LoadComplete callback I have this :

mainDocument = (mshtml.HTMLDocument) mainBrowser.Document;

Ok, so this is nice, it's working.

Now if I do this :

mshtml.IHTMLElement elem = mainDocument.getElementById("MY_ID");

it's also very nice, can do elem.innerHTML or somes stuff like that.

BUT my problem is only HTMLDocument have methodes to find elements by ID, by tagnames..etc

I don't know how to find elements in IHTMLElement. I tried some stuff like casting IHTMLElement to IHTMLElement2..etc but nothing have worked.

Please if you have any ideas. A lot of people talks about hosting winforms webbrowser but I think it must have a way to do that only with mshtml.

Thanks a lot, If you need more information, please feel free to ask me

ps : I'm french so I'm sorry about my Engish skills

回答1:

If you want to parse HTML document in Winforms or wpf, you can use an excellent parser htmlagility pack. Refer to below link http://html-agility-pack.net

  var url = "http://html-agility-pack.net/";
 var web = new HtmlWeb();
 var doc = web.Load(url);

After loading it in doc, you can get any attribute, tag, etc.

 var value = doc.DocumentNode
.SelectNodes("//td/input")
.First()
.Attributes["value"].Value;

It's super easy, just explore the doc a bit and you can make full use of it.

You can load html agility pack even from webbrowser, like below

HtmlAgilityPack.HtmlDocument doc = new 
HtmlAgilityPack.HtmlDocument();
            doc.Load(webBrowser1.DocumentStream);

Or you can do like this

HtmlAgilityPack.HtmlDocument doc = new 
HtmlAgilityPack.HtmlDocument();
            doc.Load(webBrowser1.Document);

Thanks



回答2:

Thanks a lot @Sujit for your help. I've not enouth reputation to mark your answer as helpful but I hope others will do.

To get it work with wpf webbrowser I've done :

mainHTMLDoc.LoadHtml((mainBrowser.Document as mshtml.HTMLDocument).documentElement.innerHTML);

To manipulate everything in should use this :

using System.Linq;

After that you can do stuffs like that :

var table = mainHTMLDoc.GetElementbyId("MyID");
var rows = table.Element("tbody").Elements("tr");
for(int i=0; i< rows.Count();i++) {
    var datacol1 = rows.ElementAt(i).Elements("td").ElementAt(0).Descendants("a").ElementAt(0).InnerHtml;
    var datacol2 = rows.ElementAt(i).Elements("td").ElementAt(1).InnerText 
}

Whitout using Linq you cannot use Elements function which are very very usefull ! Thanks again Sujit :)