Scrapy + Splash (Docker) Issue

2020-06-30 06:13发布

问题:

I have scrapy and scrapy-splash set up on a AWS Ubuntu server. It works fine for a while, but after a few hours I'll start getting error messages like this;

Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.5/site-
packages/twisted/internet/defer.py", line 1384, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
  File "/home/ubuntu/.local/lib/python3.5/site-
packages/twisted/python/failure.py", line 393, in throwExceptionIntoGe
nerator
     return g.throw(self.type, self.value, self.tb)
   File "/home/ubuntu/.local/lib/python3.5/site-
 packages/scrapy/core/downloader/middleware.py", line 43, in process_re
quest
defer.returnValue((yield download_func(request=request,spider=spider)))
twisted.internet.error.ConnectionRefusedError: Connection was refused by 
other side: 111: Connection refused.

I'll find that the splash process in docker has either terminated, or is unresponsive.

I've been running the splash process with;

sudo docker run -p 8050:8050 scrapinghub/splash

as per the scrapy-splash instructions.

I tried starting the process in a tmux shell to make sure the ssh connection is not interfering with the splah process, but no luck.

Thoughts?

回答1:

You should run the container with --restart and -d options. See the documentation how to run Splash in production.