How to write customize Downloader Middleware for s

2019-02-15 21:13发布

I am having issue communicating between selenium and scrapy object.

I am using selenium to login to some site, once I get that response I want to use scrape's functionaries to parse and process. Please can some one help me writing middleware so that every request should go through selenium web driver and response should be pass to scrapy.

Thank you!

1条回答
淡お忘
2楼-- · 2019-02-15 21:30

It's pretty straightforward, create a middleware with a webdriver and use process_request to intercept the request, discard it and use the url it had to pass it to your selenium webdriver:

from scrapy.http import HtmlResponse
from selenium import webdriver


class DownloaderMiddleware(object):
    def __init__(self):
        self.driver = webdriver.Chrome()  # your chosen driver

    def process_request(self, request, spider):
        # only process tagged request or delete this if you want all
        if not request.meta.get('selenium'):
            return
        self.driver.get(request.url)
        body = self.driver.page_source
        response = HtmlResponse(url=self.driver.current_url, body=body)
        return response

The downside of this is that you have to get rid of the concurrency in your spider since selenium webdrive can only handle one url at a time. For that see settings documentation page.

查看更多
登录 后发表回答