HTML Parsing Libraries for .NET [closed]

2019-01-19 23:28发布

I'm looking for libraries to parse HTML to extract links, forms, tags etc.

LGPL or any other commercial development friendly licenses are preferable.

Have you got any experience with one of this libraries? Or could you recommend another similar library?

标签： .net html dom parsing

1条回答

成全新的幸福

2楼-- · 2019-01-20 00:00

The HTML Agility Pack has examples of exactly this type of thing, and uses xpath for familiar queries - for example (from home page), to find all links is simply:

foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a@href")) {
    //...
}

EDIT

As of 6/19/2012, the code above, as well as the only code sample shown on HTML Agility Pack Examples page won't work. Just needs slight tweaking as shown below.

HtmlDocument doc = new HtmlDocument();
doc.Load("file.htm");

foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
{
  HtmlAttribute att = link.Attributes["href"];
  att.Value = Foo(att); // fix the link
}
doc.Save("file.htm");

0人赞添加讨论(0) 举报

HTML Parsing Libraries for .NET [closed]

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间