I have a multiprocess function in Python as described below in Scrapy that needs to be fixed. Could you make run_spider() verified first so it won't run if response.css('div.quote') Is the result blank? Right now it's still running if the result is blank (you can try to change it with something else like response.css('xxx')) to see what I mean.
Here my code:
import scrapy
import scrapy.crawler as crawler
from multiprocessing import Process, Queue
from twisted.internet import reactor
# your spider
class QuotesSpider(scrapy.Spider):
name = "quotes"
start_urls = ['http://quotes.toscrape.com/tag/humor/']
def parse(self, response):
for quote in response.css('div.quote'):
print(quote.css('span.text::text').extract_first())
# the wrapper to make it run more times
def run_spider():
def f(q):
try:
runner = crawler.CrawlerRunner()
deferred = runner.crawl(QuotesSpider)
deferred.addBoth(lambda _: reactor.stop())
reactor.run()
q.put(None)
except Exception as e:
q.put(e)
q = Queue()
p = Process(target=f, args=(q,))
p.start()
result = q.get()
p.join()
if result is not None:
raise result
print('first run:')
run_spider()
print('\nsecond run:')
run_spider()
Help me, please!
Thanks in advance!