I'm parsing the html page, and I'm new to this kind of parsing, could you suggest me the idea to parse following html
HTML Code : http://notepad.cc/share/CFRURbrk3r
for each type of room, there are list of sub rooms so I wish to group them as Parent - Childs into the List of Objects. then later we can access to each of those childs.
this is the code as far as I could do but without adding to the Objects, besides Fizzler is there any other parser I can do in this case.
var uricontent = File.ReadAllText("TestHtml/Bew.html");
var html = new HtmlDocument(); // with HTML Agility pack
html.LoadHtml(uricontent);
var doc = html.DocumentNode;
var rooms = (from r in doc.QuerySelectorAll(".rates")
from s in r.QuerySelectorAll(".rooms")
from rd in r.QuerySelectorAll(".rate")
select new
{
Name = rd.QuerySelector(".rate-description").InnerText.CleanInnerText(),
Price = r.QuerySelector(".rate-price").InnerText.CleanInnerText(),
RoomType = s.QuerySelector("tr td h2").InnerText.CleanInnerText()
}).ToArray();
Update:
Personally, I wouldn't use an Array. I would use a
List
. The implementation of aList
should allow you to add particular nodes into particular positions and grouped accordingly.Then you could simply:
Which would allow you to quickly filter through the content. Since each list item is stored. Some examples.
Update:
Another item I forgot to mention, the Html Agility Pack can do the following:
It can also grab remote or local pages.
I would actually download the Html Agility Pack from Nuget. It is incredibly powerful and robust, it will more than likely make it even easier to scrub the desired data. You can download it by following these steps:
Install-Package HtmlAgilityPack
.A great example can be found from this question.
The premise is simple:
This example shows the syntax, but it should be far easier to grab particular nodes out of the page and manipulate it accordingly with the
HtmlAgilityPack
.Hopefully this points you in a better direction.