Scrapy-splash - does splash:go(url) in lua_script

2019-06-09 20:42发布

I'm new to Scrapy-splash and I'm trying to scrape a lazy datatable which is a table with AJAX pagination.

So I need to load the website, wait until JS is executed, get html of the table and then click on the "Next" button on pagination.

My approach works but I'm afraid I'm requesting the website two times.

First time when I yield the SplashRequest and then when lua_script is executed.

Is it true? If yes, how to make it perform request just once?

class JSSpider(scrapy.Spider):
    name = 'js_spider'
    script = """
    function main(splash, args)
        splash:go(args.url)
        splash:wait(0.5)
        local page_one = splash:evaljs("$('#example').html()")
        splash:evaljs("$('#example_next').click()")
        splash:wait(2)
        local page_two = splash:evaljs("$('#example').html()")
        return {page_one=page_one,page_two=page_two}
    end"""

    def start_requests(self):
        url = f"""https://datatables.net/examples/server_side/defer_loading.html"""
        yield SplashRequest(url, endpoint='execute',callback=self.parse, args={'wait': 0.5,'lua_source':self.script,'url':url})

    def parse(self, response):
        # assert isinstance(response, SplashTextResponse)
        page_one = response.data.get('page_one',None)
        page_one_root = etree.fromstring(page_one, HTMLParser())
        page_two = response.data.get('page_two',None)
        page_two_root = etree.fromstring(page_one, HTMLParser())

EDIT

Also I would like to wait until AJAX is performed better way than just splash:wait(2). Is it possible to somehow wait until the table changed? Ideally with some timeout.

标签： javascript python scrapy splash scrapy-splash

1条回答

叛逆

2楼-- · 2019-06-09 21:12

Lua script is very literal - if you have 1 splash:go then one request is made by 1 splash worker.
Your crawler is fine here.

To pointlessly nit pick though: your spider connects to a worker via http so in theory two requests are being made: 1st to splash service and 2nd to target by splash worker.

0人赞添加讨论(0) 举报

Scrapy-splash - does splash:go(url) in lua_script

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间