I'm actually working on a personal project in C# using WPF and WPF WebBrowser. I really need to explore html DOM Elements as we used to do in javascript or php..etc
In my MainWindow I have this variable :
private mshtml.HTMLDocument mainDocument = new mshtml.HTMLDocument();
In my webBrowser LoadComplete callback I have this :
mainDocument = (mshtml.HTMLDocument) mainBrowser.Document;
Ok, so this is nice, it's working.
Now if I do this :
mshtml.IHTMLElement elem = mainDocument.getElementById("MY_ID");
it's also very nice, can do elem.innerHTML or somes stuff like that.
BUT my problem is only HTMLDocument have methodes to find elements by ID, by tagnames..etc
I don't know how to find elements in IHTMLElement. I tried some stuff like casting IHTMLElement to IHTMLElement2..etc but nothing have worked.
Please if you have any ideas. A lot of people talks about hosting winforms webbrowser but I think it must have a way to do that only with mshtml.
Thanks a lot,
If you need more information, please feel free to ask me
ps : I'm french so I'm sorry about my Engish skills
If you want to parse HTML document in Winforms or wpf, you can use an excellent parser htmlagility pack. Refer to below link
http://html-agility-pack.net
var url = "http://html-agility-pack.net/";
var web = new HtmlWeb();
var doc = web.Load(url);
After loading it in doc, you can get any attribute, tag, etc.
var value = doc.DocumentNode
.SelectNodes("//td/input")
.First()
.Attributes["value"].Value;
It's super easy, just explore the doc a bit and you can make full use of it.
You can load html agility pack even from webbrowser, like below
HtmlAgilityPack.HtmlDocument doc = new
HtmlAgilityPack.HtmlDocument();
doc.Load(webBrowser1.DocumentStream);
Or you can do like this
HtmlAgilityPack.HtmlDocument doc = new
HtmlAgilityPack.HtmlDocument();
doc.Load(webBrowser1.Document);
Thanks
Thanks a lot @Sujit for your help.
I've not enouth reputation to mark your answer as helpful but I hope others will do.
To get it work with wpf webbrowser I've done :
mainHTMLDoc.LoadHtml((mainBrowser.Document as mshtml.HTMLDocument).documentElement.innerHTML);
To manipulate everything in should use this :
using System.Linq;
After that you can do stuffs like that :
var table = mainHTMLDoc.GetElementbyId("MyID");
var rows = table.Element("tbody").Elements("tr");
for(int i=0; i< rows.Count();i++) {
var datacol1 = rows.ElementAt(i).Elements("td").ElementAt(0).Descendants("a").ElementAt(0).InnerHtml;
var datacol2 = rows.ElementAt(i).Elements("td").ElementAt(1).InnerText
}
Whitout using Linq you cannot use Elements function which are very very usefull !
Thanks again Sujit :)