How to add attributes to a request in a scrapy con

Scrapy contract fails if we are instantiating an Item or ItemLoader with the meta attribute or the Request() object passed from a previous parse method.

I was thinking of maybe overriding ScrapesContract to preprocess the request and load some dummy values in request.meta, not sure if that is good practice though.

I have seen the pre_process method in the docs (illustrated in the HasHeaderContract at the bottom) to get attributes from the request object, but I'm not sure if it can be used to set attributes.

EDIT: More details. Methods from an example crawler:

def parse_level_one(self, response):
   # populate loader
   return Request(url=url, callback=self.parse_level_two, meta={'loader': loader.load_item()})

def parse_level_two(self, response):
    """Parse product detail page

    @url http://example.com
    @scrapes some_field1 some_field2
    """
    loader = MyItemLoader(response.meta['loader'], response=response)

in the cli

$ scrapy check crawlername
Traceback... loader = MyItemLoader(response.meta['loader'], response=response)
KeyError: 'loader'

The idea that I am thinking about is this:

class LoadedScrapesContract(Contract):
    """ Contract to check presence of fields in scraped items
        @loadedscrapes page_name page_body
    """

    name = 'loadedscrapes'

    def pre_process(self, response):
        # MEDDLE WITH THE RESPONSE OBJECT HERE
        # TO ADD A META ATTRIBUTE TO RESPONSE,
        # LIKE AN EMPTY Item() or dict, JUST TO MAKE
        # THE ITEM LOADER INSTANTIATION PASS

    # this is same as ScrapesContract 
    def post_process(self, output):
        for x in output:
            if isinstance(x, BaseItem):
                for arg in self.args:
                    if not arg in x:
                        raise ContractFail("'%s' field is missing" % arg)

标签： python scrapy

1条回答

时光不老，我们不散

2楼-- · 2019-06-02 17:05

The best solution I've found for this, is to do the following rather than mucking up the contract

loader = MyItemLoader(response.meta.get('loader', MyItem()), response=response)

I prefer this method, but to stick the question, override adjust_request_args

0人赞添加讨论(0) 举报

How to add attributes to a request in a scrapy con

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间