如何提取使用HTML敏捷工具的URL的标题,图片和说明(how to extract a url&#

2019-10-17 09:09发布

我想提取标题,描述和利用URL图像HTML敏捷实用到目前为止我不能够找到一个例子是易于理解和可以帮助我做到这一点。

我将不胜感激,如果一些能帮助我,比如,这样我可以提取标题,描述和给用户选择,从一系列图像(有些事情与Facebook类似,当我们分享的链接)的选择图像。

更新:

我有地方标题,说明和按钮,文本框的标签在.aspx页面上和我火了以下按钮点击事件代码。 但对于所有的值返回null。 可能是我做错了什么。

我用下面的示例URLhttp://edition.cnn.com/2012/10/31/world/asia/india/index.html HPT = hp_t2

protected void btnGetURLDetails_Click(object sender, EventArgs e)
{
    HtmlDocument doc = new HtmlDocument();
    var response = txtURL.Text;
    doc.LoadHtml(response);

    String title = (from x in doc.DocumentNode.Descendants()
                    where x.Name.ToLower() == "title"
                    select x.InnerText).FirstOrDefault();

    String desc = (from x in doc.DocumentNode.Descendants()
                   where x.Name.ToLower() == "description"
                   select x.InnerText).FirstOrDefault();

    List<String> imgs = (from x in doc.DocumentNode.Descendants()
                         where x.Name.ToLower() == "img"
                         select x.Attributes["src"].Value).ToList<String>();

    lblTitle.Text = title;
    lblDescription.Text = desc;
}

上面的代码让我空值的所有变量

如果我修改这个代码

HtmlDocument doc = new HtmlDocument();
        var url = txtURL.Text;

        var webGet = new HtmlWeb();
         doc = webGet.Load(url);

在这种情况下,它只是让我值标题及描述信息是空再次

Answer 1:

protected void btnGetURLDetails_Click(object sender, EventArgs e)
{
    HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(new Uri(txtURL.Text));
    request.Method = WebRequestMethods.Http.Get;

    HttpWebResponse response = (HttpWebResponse)request.GetResponse();

    StreamReader reader = new StreamReader(response.GetResponseStream());

    String responseString = reader.ReadToEnd();

    response.Close();

    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(responseString);

    String title = (from x in doc.DocumentNode.Descendants()
                where x.Name.ToLower() == "title"
                select x.InnerText).FirstOrDefault();

    String desc = (from x in doc.DocumentNode.Descendants()
               where x.Name.ToLower() == "meta"
               && x.Attributes["name"] != null
               && x.Attributes["name"].Value.ToLower() == "description"
               select x.Attributes["content"].Value).FirstOrDefault();

    List<String> imgs = (from x in doc.DocumentNode.Descendants()
                     where x.Name.ToLower() == "img"
                     select x.Attributes["src"].Value).ToList<String>();

   lblTitle.Text = title;
   lblDescription.Text = desc;

}



文章来源: how to extract a url's title, images and description using HTML Agility utility