Parsing HTML String [duplicate]

This question already has an answer here:

What is the best way to parse html in C#? [closed] 15 answers

Is there a way to parse HTML string in .Net code behind like DOM parsing...

i.e. GetElementByTagName("abc").GetElementByTagName("tag")

I've this code chunk...

private void LoadProfilePage()
{        
    string sURL;
    sURL = "http://www.abcd1234.com/abcd1234";

    WebRequest wrGETURL;
    wrGETURL = WebRequest.Create(sURL);

    //WebProxy myProxy = new WebProxy("myproxy",80);
    //myProxy.BypassProxyOnLocal = true;

    //wrGETURL.Proxy = WebProxy.GetDefaultProxy();

    Stream objStream;
    objStream = wrGETURL.GetResponse().GetResponseStream();

    if (objStream != null)
    {
        StreamReader objReader = new StreamReader(objStream);

        string sLine = objReader.ReadToEnd();

        if (String.IsNullOrEmpty(sLine) == false)
        {
            ....                   
        }
    }
}

标签： c# .net html parsing

5条回答

老娘就宠你

2楼-- · 2019-02-16 15:41

Take a look at using the Html Agility Pack

Example of its use:

 HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm");
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    HtmlAttribute att = link["href"];
    att.Value = FixLink(att);
 }

0人赞添加讨论(0) 举报

乱世女痞

3楼-- · 2019-02-16 15:41

I've used the HTML Agility Pack to do this exact thing and I think it's great. It has been really helpful to me.

0人赞添加讨论(0) 举报

何必那么认真

4楼-- · 2019-02-16 15:44

You can use the HTML Agility Pack and a little XPath (it can even download the document for you):

HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load("http://www.abcd1234.com/abcd1234");
HtmlNodeCollection tags = doc.DocumentNode.SelectNodes("//abc//tag");

0人赞添加讨论(0) 举报

我欲成王，谁敢阻挡

5楼-- · 2019-02-16 15:49

maybe this can help: What is the best way to parse html in C#?

0人赞添加讨论(0) 举报

【Aperson】

6楼-- · 2019-02-16 15:55

You can use the excellent HTML Agility Pack.

This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).

0人赞添加讨论(0) 举报

Parsing HTML String [duplicate]

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间