I'm trying to use HtmlAgilityPack for parsing a web page information. This is my code:
using System;
using HtmlAgilityPack;
namespace htmparsing
{
class MainClass
{
public static void Main (string[] args)
{
string url = "https://bugs.eclipse.org";
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(url);
foreach(HtmlNode node in doc){
//do something here with "node"
}
}
}
}
But when I tried to access to doc.DocumentElement.SelectNodes
I can not see DocumentElement
in the list. I added the HtmlAgilityPack.dll in the references, but I don't know what's the problem.
I've an article that demonstrates scraping DOM elements with HAP (HTML Agility Pack) using ASP.NET. It simply lets you go through the whole process step by step. You can have a look and try it.
Scraping HTML DOM elements using HtmlAgilityPack (HAP) in ASP.NET
and about your process it's working fine for me. I've tried this way as you did with a single change.
Got the output as expected. The problem is you are asking for DocumentElement from HtmlDocument object which actually should be DocumentNode. Here's a response from a developer of HTMLAgilityPack about the problem you are facing.
HTMLDocument.DocumentElement not in object browser
The behavior you are seeing is correct.
Look at what you're actually doing: http://htmlagilitypack.codeplex.com/SourceControl/latest#Release/1_4_0/HtmlAgilityPack/HtmlNode.cs .
You're asking the top element to select nodes matching some xpath. Unless your xpath expression starts with a
//
, you're asking it for relative nodes, which are descendant nodes. A document element is a not a descendant of itself, because no element is a descendant of itself.