Scrapy contract fails if we are instantiating an Item or ItemLoader with the meta attribute or the Request() object passed from a previous parse method.
I was thinking of maybe overriding ScrapesContract to preprocess the request and load some dummy values in request.meta, not sure if that is good practice though.
I have seen the pre_process
method in the docs (illustrated in the HasHeaderContract at the bottom) to get attributes from the request object, but I'm not sure if it can be used to set attributes.
EDIT: More details. Methods from an example crawler:
def parse_level_one(self, response):
# populate loader
return Request(url=url, callback=self.parse_level_two, meta={'loader': loader.load_item()})
def parse_level_two(self, response):
"""Parse product detail page
@url http://example.com
@scrapes some_field1 some_field2
"""
loader = MyItemLoader(response.meta['loader'], response=response)
in the cli
$ scrapy check crawlername
Traceback... loader = MyItemLoader(response.meta['loader'], response=response)
KeyError: 'loader'
The idea that I am thinking about is this:
class LoadedScrapesContract(Contract):
""" Contract to check presence of fields in scraped items
@loadedscrapes page_name page_body
"""
name = 'loadedscrapes'
def pre_process(self, response):
# MEDDLE WITH THE RESPONSE OBJECT HERE
# TO ADD A META ATTRIBUTE TO RESPONSE,
# LIKE AN EMPTY Item() or dict, JUST TO MAKE
# THE ITEM LOADER INSTANTIATION PASS
# this is same as ScrapesContract
def post_process(self, output):
for x in output:
if isinstance(x, BaseItem):
for arg in self.args:
if not arg in x:
raise ContractFail("'%s' field is missing" % arg)
The best solution I've found for this, is to do the following rather than mucking up the contract
I prefer this method, but to stick the question, override
adjust_request_args