I have a scrapy project where the item that ultimately enters my pipeline is relatively large and stores lots of metadata and content. Everything is working properly in my spider and pipelines. The logs, however, are printing out the entire scrapy Item as it leaves the pipeline (I believe):
2013-01-17 18:42:17-0600 [tutorial] DEBUG: processing Pipeline pipeline module
2013-01-17 18:42:17-0600 [tutorial] DEBUG: Scraped from <200 http://www.example.com>
{'attr1': 'value1',
'attr2': 'value2',
'attr3': 'value3',
...
snip
...
'attrN': 'valueN'}
2013-01-17 18:42:18-0600 [tutorial] INFO: Closing spider (finished)
I would rather not have all this data puked into log files if I can avoid it. Any suggestions about how to suppress this output?
Another approach is to override the
__repr__
method of theItem
subclasses to selectively choose which attributes (if any) to print at the end of the pipeline:This way, you can keep the log level at
DEBUG
and show only the attributes that you want to see coming out of the pipeline (to checkattr1
, for example).Having read through the documentation and conducted a (brief) search through the source code, I can't see a straightforward way of achieving this aim.
The hammer approach is to set the logging level in the settings to INFO (ie add the following line to settings.py):
LOG_LEVEL='INFO'
This will strip out a lot of other information about the URLs/page that are being crawled, but it will definitely suppress data about processed items.
or If you know that spider is working correctly then you can disable the entire logging
LOG_ENABLED = False
I disable that when my crawler runs fine
If you want to exclude only some attributes of the output, you can extend the answer given by @dino
I think the cleanest way to do this is to add a filter to the
scrapy.core.scraper
logger that changes the message in question. This allows you to keep your Item's__repr__
intact and to not have to change scrapy's logging level:I tried the repre way mentioned by @dino, it doesn't work well. But evolved from his idea, I tried the str method, and it works.
Here's how I do it, very simple: