scrapy not printing out stacktrace on exception

2019-05-10 22:58发布

问题:

Is there a special mechanism to force scrapy to print out all python exception/stacktrace.

I made a simple mistake of getting a list attribute wrong resulting in AttributeError which did not show up in full in the logs What showed up was :

2015-11-15 22:13:50 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 264,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'downloader/response_bytes': 40342,
 'downloader/response_count': 1,
 'downloader/response_status_count/200': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2015, 11, 15, 22, 13, 50, 860480),
 'log_count/CRITICAL': 1,
 'log_count/DEBUG': 1,
 'log_count/INFO': 1,
 'response_received_count': 1,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'spider_exceptions/AttributeError': 1,
 'start_time': datetime.datetime(2015, 11, 15, 22, 13, 49, 222371)}

So it showed the AttributeError count of 1, but didnt tell me where and how, I had to manually place ipdb.set_trace() in code to find out where it got an error. Scrapy by itself continued to carry out other threads without printing anything

ipdb>
AttributeError: "'list' object has no attribute 'match'"
> /Users/username/Programming/regent/regentscraper/spiders/regent_spider.py(139)request_listing_detail_pages_from_listing_id_list()
    138             volatile_props = ListingScanVolatilePropertiesItem()
--> 139             volatile_props['position_in_search'] = list_of_listing_ids.match(listing_id) + rank_of_first_item_in_page
    140

scrapy settings

# -*- coding: utf-8 -*-

# Scrapy settings for regentscraper project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
#     http://doc.scrapy.org/en/latest/topics/settings.html
#     http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html
#     http://scrapy.readthedocs.org/en/latest/topics/spider-middleware.html

import sys
import os
import django
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__name__), os.pardir)))

print sys.path

os.environ['DJANGO_SETTINGS_MODULE'] = 'regent.settings'
django.setup()  #new for Django 1.8



BOT_NAME = 'regentscraper'

SPIDER_MODULES = ['regentscraper.spiders']
NEWSPIDER_MODULE = 'regentscraper.spiders'


ITEM_PIPELINES = {
   'regentscraper.pipelines.ListingScanPipeline': 300,
}

回答1:

I encountered the same event as described above. The following version is used in my environments:

  • Django (1.11.4)
  • Scrapy (1.4.0)
  • scrapy-djangoitem (1.1.1)

And I solved the problem by adding "LOGGING_CONFIG = None" in dnango's settings that is loaded in scrapy. I created a new django's setting file as settings_scrapy with following contensts:

mysite.settings_scrapy

try:
    from mysite.settings import *
    LOGGING_CONFIG = None
except ImportError:
    pass

Then, the settings file is loaded in scrapy's settings file as:

import sys
import os
import django
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
os.environ['DJANGO_SETTINGS_MODULE'] = 'mysite.settings_scrapy'
django.setup()

After that, stacktrace on exceptions in spider and pipeline appered.

Reference

https://docs.djangoproject.com/en/1.11/topics/logging/#disabling-logging-configuration



回答2:

As far from where I could sea of the stack Trace in your actual spider, looks like you're trying to concatenate into an item defining?

I urge you to also include the spider complete cold as well as your items to help work this out though it was two years ago so I'm sure you've moved on or figured it out

As the stack Trace notes '"list" object has no attribute "match"' which is either an error because you're using list which is already a logic within python as you know... Seems to be the culprit since the stack Trace is telling you that list has no attribute name match ergo its using list function sooo yeah....

could also be that in your full spider code you have to find your item value and then redefined it as a list?

For good measure, when using the word list, unless using its functional-able logic, name you r "list" anything else but... list get me?