Using docker, scrapy splash on Heroku

I have a scrapy spider that uses splash which runs on Docker localhost:8050 to render javascript before scraping. I am trying to run this on heroku but have no idea how to configure heroku to start docker to run splash before running my web: scrapy crawl abc dyno. Any guides is greatly appreciated!

标签： docker heroku scrapy splash-js-render

2条回答

beautiful°

2楼-- · 2020-02-26 11:25

Run at the same problem. Finally, I succesfully deployed splash docker image on Heroku. This is my solution: I cloned the splash proyect from github and changed the Dockerfile.

Removed command EXPOSE because it's not supported by Heroku
Replaced ENTRYPOINT by CMD command.

CMD python3 /app/bin/splash --proxy-profiles-path /etc/splash/proxy-profiles --js-profiles-path /etc/splash/js-profiles --filters-path /etc/splash/filters --lua-package-path /etc/splash/lua_modules/?.lua --port $PORT

Notice that I added the option --port=$PORT. This is just to listen at the port specified by Heroku instead of the default (8050)

A fork to the proyect with this change its avaliable here You just need to build the docker image and push it to the heroku's registry, like you did before. You can test it locally first but you must pass the environment variable "PORT" when running the docker

sudo docker run -p 80:80 -e PORT=80 mynewsplashimage

0人赞添加讨论(0) 举报

唯我独甜

3楼-- · 2020-02-26 11:32

From what I gather you're expecting:

Splash instance running on Heroku via Docker container
Your web application (Scrapy spider) running in a Heroku dyno

Splash instance

Ensure you can have docker CLI and heroku CLI installed
As seen in Heroku's Container Registry - Pushing existing image(s):
- Ensure docker CLI and heroku CLI are installed
- heroku container:login
- docker tag scrapinghub/splash registry.heroku.com/<app-name>/web
- docker push registry.heroku.com/<app-name>/web
- To test the application: heroku open -a <app-name>. This should allow you to see the Splash UI at port 8050 on the Heroku host for this app name.
  - You may need to ensure $PORT is set appropriately as the EXPOSE docker configuration is not respected (https://devcenter.heroku.com/articles/container-registry-and-runtime#dockerfile-commands-and-runtime)

Running Dyno Scrapy Web App

Configure your application to point to <app-host-name>:8050. And the Scrapy spider should now be able to request to the Splash instance previously run.

0人赞添加讨论(0) 举报