Is there a special mechanism to force scrapy to print out all python exception/stacktrace.
I made a simple mistake of getting a list attribute wrong resulting in AttributeError which did not show up in full in the logs What showed up was :
2015-11-15 22:13:50 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 264,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 40342,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2015, 11, 15, 22, 13, 50, 860480),
'log_count/CRITICAL': 1,
'log_count/DEBUG': 1,
'log_count/INFO': 1,
'response_received_count': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'spider_exceptions/AttributeError': 1,
'start_time': datetime.datetime(2015, 11, 15, 22, 13, 49, 222371)}
So it showed the AttributeError count of 1, but didnt tell me where and how, I had to manually place ipdb.set_trace() in code to find out where it got an error. Scrapy by itself continued to carry out other threads without printing anything
ipdb>
AttributeError: "'list' object has no attribute 'match'"
> /Users/username/Programming/regent/regentscraper/spiders/regent_spider.py(139)request_listing_detail_pages_from_listing_id_list()
138 volatile_props = ListingScanVolatilePropertiesItem()
--> 139 volatile_props['position_in_search'] = list_of_listing_ids.match(listing_id) + rank_of_first_item_in_page
140
scrapy settings
# -*- coding: utf-8 -*-
# Scrapy settings for regentscraper project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
# http://doc.scrapy.org/en/latest/topics/settings.html
# http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html
# http://scrapy.readthedocs.org/en/latest/topics/spider-middleware.html
import sys
import os
import django
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__name__), os.pardir)))
print sys.path
os.environ['DJANGO_SETTINGS_MODULE'] = 'regent.settings'
django.setup() #new for Django 1.8
BOT_NAME = 'regentscraper'
SPIDER_MODULES = ['regentscraper.spiders']
NEWSPIDER_MODULE = 'regentscraper.spiders'
ITEM_PIPELINES = {
'regentscraper.pipelines.ListingScanPipeline': 300,
}
As far from where I could sea of the stack Trace in your actual spider, looks like you're trying to concatenate into an item defining?
I urge you to also include the spider complete cold as well as your items to help work this out though it was two years ago so I'm sure you've moved on or figured it out
As the stack Trace notes '"list" object has no attribute "match"' which is either an error because you're using list which is already a logic within python as you know... Seems to be the culprit since the stack Trace is telling you that list has no attribute name match ergo its using list function sooo yeah....
could also be that in your full spider code you have to find your item value and then redefined it as a list?
For good measure, when using the word list, unless using its functional-able logic, name you r "list" anything else but... list get me?
I encountered the same event as described above. The following version is used in my environments:
And I solved the problem by adding "LOGGING_CONFIG = None" in dnango's settings that is loaded in scrapy. I created a new django's setting file as settings_scrapy with following contensts:
mysite.settings_scrapy
Then, the settings file is loaded in scrapy's settings file as:
After that, stacktrace on exceptions in spider and pipeline appered.
Reference
https://docs.djangoproject.com/en/1.11/topics/logging/#disabling-logging-configuration