How to write customize Downloader Middleware for s

2019-02-15 21:13发布

I am having issue communicating between selenium and scrapy object.

I am using selenium to login to some site, once I get that response I want to use scrape's functionaries to parse and process. Please can some one help me writing middleware so that every request should go through selenium web driver and response should be pass to scrapy.

Thank you!

标签： selenium scrapy

1条回答

淡お忘

2楼-- · 2019-02-15 21:30

It's pretty straightforward, create a middleware with a webdriver and use process_request to intercept the request, discard it and use the url it had to pass it to your selenium webdriver:

from scrapy.http import HtmlResponse
from selenium import webdriver


class DownloaderMiddleware(object):
    def __init__(self):
        self.driver = webdriver.Chrome()  # your chosen driver

    def process_request(self, request, spider):
        # only process tagged request or delete this if you want all
        if not request.meta.get('selenium'):
            return
        self.driver.get(request.url)
        body = self.driver.page_source
        response = HtmlResponse(url=self.driver.current_url, body=body)
        return response

The downside of this is that you have to get rid of the concurrency in your spider since selenium webdrive can only handle one url at a time. For that see settings documentation page.

0人赞添加讨论(0) 举报

How to write customize Downloader Middleware for s

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间