simulate clicking link on web page

2019-07-22 05:09发布

问题:

I am trying to scrape below webpage

http://www.houseoffraser.co.uk/Eliza+J+3/4+sleeve+ruched+waist+dress/165288648,default,pd.html

The stock data for each colour/size combination appears only when the colour or size is selected. In r is it possible to simulate this to get the data.

So far, I have been able to capture the colour and size

mcolour = toString(xpathSApply(page,'//ul[@class="colour-swatches-list toggle-panel"]//li[@title]',xmlGetAttr,"title"))

size = xpathSApply(page,'//ul[@class="size-swatches-list toggle-panel"]//li[@data-size]',xmlGetAttr,"data-size")

but I am not sure how capture stock levels per colour/size combination.

Please advice !

============================================================ I could not find new as a method, Am I missing anything ?

firefoxClass
Generator for class "firefoxClass":

Class fields:

Name:  exceptionTable     javaWarMes     javaDriver   javaNavigate
Class:         matrix            ANY            ANY            ANY

Class Methods:  
"back", "callSuper", "close", "copy", "export", "field", "findElementByClassName", 
 "findElementByCssSelector", "findElementById", "findElementByLinkText",  "findElementByName", 
 "findElementByPartialLinkText", "findElementByTagName", "findElementByXPath", 
 "findElementsByClassName", "findElementsByCssSelector", "findElementsById", 
 "findElementsByLinkText", "findElementsByName", "findElementsByPartialLinkText", 
 "findElementsByTagName", "findElementsByXPath", "forward", "get", "getCapabilities", 
 "getClass", "getCurrentUrl", "getPageSource", "getRefClass", "getTitle", "getVersion", 
  "import", "initFields", "initialize", "initialize#exceptionClass", "printHtml",   "refresh", 
  "show", "show#envRefClass", "trace", "tryExc", "untrace", "usingMethods"


  Reference Superclasses:  
  "exceptionClass", "envRefClass"

回答1:

For a given product ID pid which you can scrape from the page, you can get stock availability by querying:

http://www.houseoffraser.co.uk/on/demandware.store/Sites-hof-Site/default/Product-UpdateQuantityList?pid=165288698&quantity=1

you don't even need to set any cookies for that query. That returns an HTML and javascript chunk that is used to set the control on the page. Here's an example of limited stock (currently 2, although I might have just bought all of them by accident):

http://www.houseoffraser.co.uk/on/demandware.store/Sites-hof-Site/default/Product-UpdateQuantityList?pid=165288648&quantity=1

You could get the number in stock by either parsing the availabilityMessage string or the <select> control.

The only step I've not worked out is getting the pid values, and how you would map those to the descriptions, but that should all be on the page somewhere if it isn't being downloaded by Ajax requests (which is where the stock data comes from).

You are using the Chrome debugger/inspector aren't you?



回答2:

Here is an example using relenium, which you can easily extend to also query product colours:

require(relenium) # More info: https://github.com/LluisRamon/relenium
require(XML)
firefox <- firefoxClass$new() # init browser
firefox$get("http://www.houseoffraser.co.uk/Eliza+J+3/4+sleeve+ruched+waist+dress/165288648,default,pd.html") # open url
sizes <- xpathSApply(htmlParse(firefox$getPageSource()), "//ul[@class='size-swatches-list toggle-panel']/li/a", xmlValue) # read available sizes

stockMsg <- vector() # init stock message vector
for (size in sizes) { # for each available size
  sizeLink <- firefox$findElementByXPath(sprintf("//ul[@class='size-swatches-list toggle-panel']/li[@data-size='%s']", size)) # focus size link
  sizeLink$click() # click size link
  stockMsg <- c(stockMsg, # and append stock message to stock message vector
                firefox$findElementByXPath("/html/body/div/div[3]/div/div/div[4]/div/div/div/div/form/div[4]/div[4]/div")$getText()
                )
}
setNames(stockMsg, sizes) # name stock msg vector and print it
# 8                       10 
# "in stock"               "in stock" 
# 12                       14 
# "in stock"               "in stock" 
# 16                       18 
# "in stock" "in stock, only 17 left" 
# 20                       22 
# "in stock, only 2 left"  "in stock, only 2 left" 
# 24                       26 
# "Out of stock"           "Out of stock" 
# 28 
# "Out of stock"