I want to use the HTML agility pack to parse tables from complex web pages, but I am somehow lost in the object model.
I looked at the link example, but did not find any table data this way.
Can I use XPath to get the tables? I am basically lost after having loaded the data as to how to get the tables. I have done this in Perl before and it was a bit clumsy, but worked. (HTML::TableParser
).
I am also happy if one can just shed a light on the right object order for the parsing.
In my case, there is a single table which happens to be a device list from a router. If you wish to read the table using TR/TH/TD (row, header, data) instead of a matrix as mentioned above, you can do something like the following:
TableRow is just a simple object with Header and Data as properties. The approach takes care of null-ness and this case:
which is row without a header. The HtmlBody object with the constants hanging off of it are probably readily deduced but I apologize for it even still. I came from the world where if you have " in your code, it should either be constant or localizable.
How about something like: Using HTML Agility Pack
Note that you can make it prettier with LINQ-to-Objects if you want:
Line from above answer:
This doesn't work in VS 2015 C#. You cannot construct an
HtmlDocument
any more.Another MS "feature" that makes things more difficult to use. Try
HtmlAgilityPack.HtmlWeb
and check out this link for some sample code.The most simple what I've found to get the XPath for a particular Element is to install FireBug extension for Firefox go to the site/webpage press F12 to bring up firebug; right select and right click the element on the page that you want to query and select "Inspect Element" Firebug will select the element in its IDE then right click the Element in Firebug and choose "Copy XPath" this function will give you the exact XPath Query you need to get the element you want using HTML Agility Library.