I am trying to scrape below webpage
http://www.houseoffraser.co.uk/Eliza+J+3/4+sleeve+ruched+waist+dress/165288648,default,pd.html
The stock data for each colour/size combination appears only when the colour or size is selected. In r is it possible to simulate this to get the data.
So far, I have been able to capture the colour and size
mcolour = toString(xpathSApply(page,'//ul[@class="colour-swatches-list toggle-panel"]//li[@title]',xmlGetAttr,"title"))
size = xpathSApply(page,'//ul[@class="size-swatches-list toggle-panel"]//li[@data-size]',xmlGetAttr,"data-size")
but I am not sure how capture stock levels per colour/size combination.
Please advice !
============================================================
I could not find new as a method, Am I missing anything ?
firefoxClass
Generator for class "firefoxClass":
Class fields:
Name: exceptionTable javaWarMes javaDriver javaNavigate
Class: matrix ANY ANY ANY
Class Methods:
"back", "callSuper", "close", "copy", "export", "field", "findElementByClassName",
"findElementByCssSelector", "findElementById", "findElementByLinkText", "findElementByName",
"findElementByPartialLinkText", "findElementByTagName", "findElementByXPath",
"findElementsByClassName", "findElementsByCssSelector", "findElementsById",
"findElementsByLinkText", "findElementsByName", "findElementsByPartialLinkText",
"findElementsByTagName", "findElementsByXPath", "forward", "get", "getCapabilities",
"getClass", "getCurrentUrl", "getPageSource", "getRefClass", "getTitle", "getVersion",
"import", "initFields", "initialize", "initialize#exceptionClass", "printHtml", "refresh",
"show", "show#envRefClass", "trace", "tryExc", "untrace", "usingMethods"
Reference Superclasses:
"exceptionClass", "envRefClass"
For a given product ID pid
which you can scrape from the page, you can get stock availability by querying:
http://www.houseoffraser.co.uk/on/demandware.store/Sites-hof-Site/default/Product-UpdateQuantityList?pid=165288698&quantity=1
you don't even need to set any cookies for that query. That returns an HTML and javascript chunk that is used to set the control on the page. Here's an example of limited stock (currently 2, although I might have just bought all of them by accident):
http://www.houseoffraser.co.uk/on/demandware.store/Sites-hof-Site/default/Product-UpdateQuantityList?pid=165288648&quantity=1
You could get the number in stock by either parsing the availabilityMessage
string or the <select>
control.
The only step I've not worked out is getting the pid
values, and how you would map those to the descriptions, but that should all be on the page somewhere if it isn't being downloaded by Ajax requests (which is where the stock data comes from).
You are using the Chrome debugger/inspector aren't you?
Here is an example using relenium
, which you can easily extend to also query product colours:
require(relenium) # More info: https://github.com/LluisRamon/relenium
require(XML)
firefox <- firefoxClass$new() # init browser
firefox$get("http://www.houseoffraser.co.uk/Eliza+J+3/4+sleeve+ruched+waist+dress/165288648,default,pd.html") # open url
sizes <- xpathSApply(htmlParse(firefox$getPageSource()), "//ul[@class='size-swatches-list toggle-panel']/li/a", xmlValue) # read available sizes
stockMsg <- vector() # init stock message vector
for (size in sizes) { # for each available size
sizeLink <- firefox$findElementByXPath(sprintf("//ul[@class='size-swatches-list toggle-panel']/li[@data-size='%s']", size)) # focus size link
sizeLink$click() # click size link
stockMsg <- c(stockMsg, # and append stock message to stock message vector
firefox$findElementByXPath("/html/body/div/div[3]/div/div/div[4]/div/div/div/div/form/div[4]/div[4]/div")$getText()
)
}
setNames(stockMsg, sizes) # name stock msg vector and print it
# 8 10
# "in stock" "in stock"
# 12 14
# "in stock" "in stock"
# 16 18
# "in stock" "in stock, only 17 left"
# 20 22
# "in stock, only 2 left" "in stock, only 2 left"
# 24 26
# "Out of stock" "Out of stock"
# 28
# "Out of stock"