c# html agility pack

2019-07-04 15:56发布

We are moving an e-commerce website to a new platform and because all of their pages are static html and they do not have all their product information in a database, we must scrape their current website for the product descriptions.

Here is one of the pages: http://www.cabinplace.com/accrugsbathblackbear.htm

What is the best was to get the description into a string? Should I use html agility pack? and if so how would this be done? as I am new to html agility pack and xhtml in general.

Thanks

1条回答
看我几分像从前
2楼-- · 2019-07-04 16:31

The HTML Agility Pack is a good library to use for this kind of work.

You did not indicate if all of the content is structured this way nor if you have already gotten the kind of fragment you posted from the HTML files, so it is difficult to advise further.

In general, if all pages are structured similarly, I would use an XPath expression to extract the paragraph and pick the innerHtml or innerText from each page.

Something like the following:

var description = htmlDoc.SelectNodes("p[@class='content_txt']")[0].innerText;
查看更多
登录 后发表回答