I am trying scrapy to crawl a ajax website http://play.google.com/store/apps/category/GAME/collection/topselling_new_free
I want to get all the links directing to each game.
I inspect the element of the page. And it looks like this: how the page looks like so I want to extract all links with the pattern /store/apps/details?id=
but when I ran commands in the shell, it returns nothing: shell command
I've also tried //a/@href. didn't work out either but Don't know what is wrong going on....
- Now I can crawl first 120 links with starturl modified and 'formdata' added as someone told me but no more links after that.
Can someone help me with this?
It's actually an
ajax-post-request
which populates the data on that page. In scrapy shell, you won't get this, instead of inspect element check thenetwork
tab there you will find the request.Make post request to
https://play.google.com/store/apps/category/GAME/collection/topselling_new_free?authuser=0
url withformdata={'start':'0','num':'60','numChildren':'0','ipf':'1','xhr':'1'}
Increment start by 60 on each request to get the paginated result.