How to parse an HTML node's attributes

2019-07-29 07:50发布

I use C# and need to parse an HTML to read the attributes into key value pairs. e.g given the following HTML snippet

<DIV myAttribute style="BORDER-BOTTOM: medium none; BACKGROUND-COLOR: transparent; BORDER-TOP: medium none" id=my_ID anotherAttribNamedDIV class="someclass">

Please note that the attributes can be
1. key="value" pairs e.g class="someclass"
2. key=value pairs e.g id=my_ID (no quotes for values)
3. plain attributes e.g myAttribute, which doesn't have a "value"

I need to store them into a dictionary with key value pairs as follows
key=myAttribute value=""
key=style value="BORDER-BOTTOM: medium none; BACKGROUND-COLOR: transparent; BORDER-TOP: medium none"
key=id value="my_ID"
key=anotherAttribNamedDIV value=""
key=class value="someclass"

I am looking for regular expressions to do this.

2条回答
SAY GOODBYE
2楼-- · 2019-07-29 08:15

You can do this with the HtmlAgilityPack

string myDiv = @"<DIV myAttribute style=""BORDER-BOTTOM: medium none; BACKGROUND-COLOR: transparent; BORDER-TOP: medium none"" id=my_ID anotherAttribNamedDIV class=""someclass""></DIV>";
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(myDiv);
HtmlNode node = doc.DocumentNode.SelectSingleNode("div");

Literal1.Text = ""; 

foreach (HtmlAttribute attr in node.Attributes)
{
    Literal1.Text += attr.Name + ": " + attr.Value + "<br />";
}
查看更多
Bombasti
3楼-- · 2019-07-29 08:25
HtmlDocument docHtml = new HtmlWeb().Load(url);
查看更多
登录 后发表回答