scrapy can't crawl all links in a page

2019-03-02 13:43发布

I am trying scrapy to crawl a ajax website http://play.google.com/store/apps/category/GAME/collection/topselling_new_free

I want to get all the links directing to each game.

I inspect the element of the page. And it looks like this: how the page looks like so I want to extract all links with the pattern /store/apps/details?id=

but when I ran commands in the shell, it returns nothing: shell command

I've also tried //a/@href. didn't work out either but Don't know what is wrong going on....

Now I can crawl first 120 links with starturl modified and 'formdata' added as someone told me but no more links after that.

Can someone help me with this?

标签： python shell xpath scrapy

1条回答

放我归山

2楼-- · 2019-03-02 14:14

It's actually an ajax-post-request which populates the data on that page. In scrapy shell, you won't get this, instead of inspect element check the network tab there you will find the request.

Make post request to https://play.google.com/store/apps/category/GAME/collection/topselling_new_free?authuser=0 url with formdata={'start':'0','num':'60','numChildren':'0','ipf':'1','xhr':'1'}

Increment start by 60 on each request to get the paginated result.

0人赞添加讨论(0) 举报

scrapy can't crawl all links in a page

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间