c# html agility pack

2019-07-04 15:56发布

We are moving an e-commerce website to a new platform and because all of their pages are static html and they do not have all their product information in a database, we must scrape their current website for the product descriptions.

Here is one of the pages: http://www.cabinplace.com/accrugsbathblackbear.htm

What is the best was to get the description into a string? Should I use html agility pack? and if so how would this be done? as I am new to html agility pack and xhtml in general.

Thanks

标签： c# html parsing html-agility-pack pack

1条回答

看我几分像从前

2楼-- · 2019-07-04 16:31

The HTML Agility Pack is a good library to use for this kind of work.

You did not indicate if all of the content is structured this way nor if you have already gotten the kind of fragment you posted from the HTML files, so it is difficult to advise further.

In general, if all pages are structured similarly, I would use an XPath expression to extract the paragraph and pick the innerHtml or innerText from each page.

Something like the following:

var description = htmlDoc.SelectNodes("p[@class='content_txt']")[0].innerText;

0人赞添加讨论(0) 举报

c# html agility pack

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间