How to extract href tag from a string in C#?

2020-07-18 10:32发布

I have a method that returns a string in the following format:

string tableTag = "<th><a href="Boot_53.html">135 Boot</a></th>"

I want to get the value of the href attribute and store it into another string called link:

string link = "Boot_53.html"

In other words, link should be assigned the href attribute in the string. How can I accomplish that?

5条回答
你好瞎i
2楼-- · 2020-07-18 11:00

You can use Regex:

string input= "<th><a href=\"Boot_53.html\">135 Boot</a></th>";
string regex= "href=\"(.*)\"";
Match match = Regex.Match(input, regex);
if (match.Success)
{
    string link= match.Groups[1].Value;
    Console.WriteLine(link);
}
查看更多
够拽才男人
3楼-- · 2020-07-18 11:02

You could use an HTML parser such as HTML agility pack to parse the input HTML and extract the information you are looking for:

using HtmlAgilityPack;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

class Program
{
    static void Main(string[] args)
    {
        var doc = new HtmlDocument();
        string tableTag = "<th><a href=\"Boot_53.html\">135 Boot</a></th>";
        doc.LoadHtml(tableTag);

        var anchor = doc.DocumentNode.SelectSingleNode("//a");
        if (anchor != null)
        {
            string link = anchor.Attributes["href"].Value;
            Console.WriteLine(link);
        }
    }
}
查看更多
疯言疯语
4楼-- · 2020-07-18 11:02

Use HtmlAgilityPack to parse HTML:

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml( tableTag ); 
string link = doc.DocumentNode.SelectSingleNode("//a").Attributes["href"].Value;
查看更多
看我几分像从前
5楼-- · 2020-07-18 11:13

You can use AngleSharp as an alternative to HtmlAgilityPack:

var context = BrowsingContext.New(Configuration.Default);

string tableTag = "<th><a href=\"Boot_53.html\">135 Boot</a></th>";

var document = await context.OpenAsync(req => req.Content(tableTag));

var anchor = document.All.FirstOrDefault(x => x.LocalName == "a");
if (anchor != null)
{
    string link = anchor.GetAttribute("href"); // "Boot_53.html"
}
查看更多
时光不老,我们不散
6楼-- · 2020-07-18 11:26

If you know that the html is actually a xhtml (an html which conforms to the xml standarts [more or less]) you can parse is simply with tools dedicated to xml (which are generally simpler than those for html).

var hrefLink = XElement.Parse("<th><a href=\"Boot_53.html\">135 Boot</a></th>")
                       .Descendants("a")
                       .Select(x => x.Attribute("href").Value)
                       .FirstOrDefault();
查看更多
登录 后发表回答