Have just installed whenever gem https://github.com/javan/whenever to run my rake tasks, which are nokogiri / feedzilla dependent scraping tasks.
eg my tasks are called grab_bbc, grab_guardian etc
My question - as I update my site, I keep add more tasks to scheduler.rake.
What should I write in my config/schedule.rb to make all rake tasks run, no matter what they are called?
Would something like this work?
every 12.hours do
rake:task.each do |task|
runner task
end
end
Am new to Cron, using RoR 4.
namespace :sc do
desc 'All'
task all: [:create_categories, :create_subcategories]
desc 'Create categories'
task create_categories: :environment do
# your code
end
desc 'Create subcategories'
task create_subcategories: :environment do
# your code
end
end
in console write $ rake sc:all
write separate rake tasks for each scraping tasks. then write a aggregated task to run all those scraping rake tasks.
desc "scrape nytimes"
task :scrape_nytimes do
# scraping method
end
desc "scrape guardian"
task :scrape_guardian do
# scraping method
end
desc "perform all scraping"
task :scrape do
Rake::Task[:scrape_nytimes].execute
Rake::Task[:scrape_guardian].execute
end
then call the rake task as
rake scrape
Make sure you have a unique namespace with all the tasks in it, like:
namespace :scrapers do
desc "Scraper Number 1"
task :scrape_me do
# Your code here
end
desc "Scraper Number 2"
task :scrape_it do
# Your code here
end
end
You could then run all tasks of that namespace with a task outside of that namespace:
task :run_all_scrapers do
Rake.application.tasks.each do |task|
task.invoke if task.name.starts_with?("scrapers:")
end
end
That said, I'm pretty sure that this is not how you should run a set of scrapers. If for any reason the if
part should return true you might unintenionally run tasks like rake db:drop
Either "manually" maintaining schedule.rb
or a master task seems like a better option to me.
The aggregated task can be concise:
namespace :scrape do
desc "scrape nytimes"
task :nytimes do
# scraping method
end
desc "scrape guardian"
task :guardian do
# scraping method
end
end
desc "perform all scraping"
task scrape: ['scrape:nytimes', 'scrape:guardian']
Namespaces are also a good practice.