I have a nokigiri web scraper that publishes to a database that I'm trying to publish to heroku. I have a sinatra application frontend that I want to have pull in from the database. I'm new to Heroku and web development, and don't know the best way to handle something like this.
Do I have to place the web scraper script that uploads to the database under a sinatra route (like mywebsite.com/scraper ) and just make it so obscure that no one visits it? In the end, I'd like to have the sinatra part be a rest api that pulls from the database.
Thanks for all input
There are two approaches you can take.
The first one is to use One-off dynos by running the scraper through the console using
heroku run YOURCMD
. Just make sure scraper don't write to disk but uses database.More information: https://devcenter.heroku.com/articles/one-off-dynos
The second is differentiating between scraper and web process in a way that you have web process for normal UI interaction and a scraper process which web process can spawn/talk to. If you take this route it's up to you how to protect it from rest of the world (auth/url obfuscation etc.).
More information: https://devcenter.heroku.com/articles/background-jobs-queueing
I did it by creating a rake task and using the one-off dynos as mentioned by XLII
Here is my rake task file
You can simply run it by call