Web scraping Oracle (ATG) Commerce

2019-08-29 05:29发布

I am new to web scraping, and I use the following tool and method to scrap:

  • I use R (with packages Curl, XML, etc) to read the web pages (with a url link), and htmlTreeParse function to parse the html page.
  • Then in order to know get the data I want, I first use the developer tool i Chrome to insepct the code.
  • When I know in which node the data are, I use xpathApply to get them.

Usually, it works well. But I had an issue with this site: http://www.sephora.fr/Parfum/Parfum-Femme/C309/2

  • When you click on the link, you will load the page, and in fact it is the page 1 (of the products).
  • You have to load the url again (by entering a second time the url), in order to get the page 2.
  • When I use the usual process to read the data. The htmlTreeParse function always gives me the page1.

I tried to understand more this web site:

This doesn't help to know which selection I made.

Could you please help:

  • How can I access to more products ?

Thank you

1条回答
爱情/是我丢掉的垃圾
2楼-- · 2019-08-29 06:12

I found the solution: selenium ! I think that it is the ultimate tool for web scraping. I posted several questions concerning web scraping, now with rselenium, almost everything is possible.

查看更多
登录 后发表回答