Heroku and Web scraping

I have a nokigiri web scraper that publishes to a database that I'm trying to publish to heroku. I have a sinatra application frontend that I want to have pull in from the database. I'm new to Heroku and web development, and don't know the best way to handle something like this.

Do I have to place the web scraper script that uploads to the database under a sinatra route (like mywebsite.com/scraper ) and just make it so obscure that no one visits it? In the end, I'd like to have the sinatra part be a rest api that pulls from the database.

Thanks for all input

标签： ruby web-services api heroku sinatra

2条回答

Ridiculous、

2楼-- · 2019-04-28 20:26

There are two approaches you can take.

The first one is to use One-off dynos by running the scraper through the console using heroku run YOURCMD. Just make sure scraper don't write to disk but uses database.

More information: https://devcenter.heroku.com/articles/one-off-dynos

The second is differentiating between scraper and web process in a way that you have web process for normal UI interaction and a scraper process which web process can spawn/talk to. If you take this route it's up to you how to protect it from rest of the world (auth/url obfuscation etc.).

More information: https://devcenter.heroku.com/articles/background-jobs-queueing

0人赞添加讨论(0) 举报

放荡不羁爱自由

3楼-- · 2019-04-28 20:29

I did it by creating a rake task and using the one-off dynos as mentioned by XLII

Here is my rake task file

require 'bundler/setup'
Bundler.require

desc "Scrape Site"
 task :scrape, [:companyname]  => :environment do |t, args|
    puts "Company Name is :" + args[:companyname]

    agent = Mechanize.new
    agent.user_agent_alias = 'Mac Safari'
    puts "Agent (Mac Safari Created)"
        # MORE SCRAPING CODE

 end

You can simply run it by call

heroku run rake scrape[google]

0人赞添加讨论(0) 举报

Heroku and Web scraping

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间