Does .NET framework offer methods to parse an HTML

2019-06-16 16:33发布

Knowing that I can't use HTMLAgilityPack, only straight .NET, say I have a string that contains some HTML that I need to parse and edit in such ways:

  • find specific controls in the hierarchy by id or by tag
  • modify (and ideally create) attributes of those found elements

Are there methods available in .net to do so?

4条回答
在下西门庆
2楼-- · 2019-06-16 16:55

HtmlDocument

GetElementById

HtmlElement

You can create a dummy html document.

WebBrowser w = new WebBrowser();
w.Navigate(String.Empty);
HtmlDocument doc = w.Document;
doc.Write("<html><head></head><body><img id=\"myImage\" src=\"c:\"/><a id=\"myLink\" href=\"myUrl\"/></body></html>");
Console.WriteLine(doc.Body.Children.Count);
Console.WriteLine(doc.GetElementById("myImage").GetAttribute("src"));
Console.WriteLine(doc.GetElementById("myLink").GetAttribute("href"));
Console.ReadKey();

Output:

2

file:///c:

about:myUrl

Editing elements:

HtmlElement imageElement = doc.GetElementById("myImage");
string newSource = "d:";
imageElement.OuterHtml = imageElement.OuterHtml.Replace(
        "src=\"c:\"",
        "src=\"" + newSource + "\"");
Console.WriteLine(doc.GetElementById("myImage").GetAttribute("src"));

Output:

file:///d:

查看更多
够拽才男人
3楼-- · 2019-06-16 17:14

You can look at how HTML Agility Pack works, however, it is .Net. You can reflect the assembly and see that it is using the MFC and could be reproduced if you so wanted, but you'd be doing nothing more than moving the assembly, not making it any more .Net.

查看更多
何必那么认真
4楼-- · 2019-06-16 17:15

Assuming you're dealing with well formed HTML, you could simply treat the text as an XML document. The framework is loaded with features to do exactly what you're asking.

http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.aspx

查看更多
对你真心纯属浪费
5楼-- · 2019-06-16 17:15

Aside from the HTML Agility Pack, and porting HtmlUnit over to C#, what sounds like solid solutions are:

  • Most obviously - use regex. (System.Text.RegularExpressions)
  • Using an XML Parser. (because HTML is a system of tags treat it like an XML document?)
  • Linq?

One thing I do know is that parsing HTML like XML may cause you to run into a few problems. XML and HTML are not the same. Read about it: here

Also, here is a post about Linq vs Regex.

查看更多
登录 后发表回答