I need to import a large CSV file, broken down to small chunks that will be imported every X hours.
I made the following rake task
task :import_reviews => :environment do
require 'csv'
CSV.foreach('reviews.csv', :headers => true) do |row|
Review.create(row.to_hash)
end
end
Using heroku scheduler I could let this task run every day, but I want to break it up in several chunks, for example 100 records every day:
That means I need to keep track of the last row imported, and start with that row += 1 the next time I would let the rake task run, how can I implement this?
Thanks in advance!
Read the rest of the CSV in to an array and outside the CSV.foreach loop write to the same CSV file, so that it gets smaller each time. I suppose i don't have to give this in code but if necessary comment me and i'll do.
If you want to keep the CSV in a whole, add a field "pocessed" to the CSV and fill it with a 1 if read, next time filter these out.
EDIT: this isn't tested and sure could be better but just to show what i mean
require 'csv'
index = 1
csv_out = CSV::Writer.generate(File.open('new.csv', 'wb'))
CSV.foreach('reviews.csv', :headers => true) do |row|
if index < 101
Review.create(row.to_hash)
else
csv_out << row
end
index += 1
end
csv_out.close
afterward, dump reviews.csv and rename new.csv to reviews.csv
you might want to do something like this for the chunked CSV parsing, and then enqueue the jobs which hit the database with Resque and schedule them in an appropriate way, so they run throttled:
https://gist.github.com/3101950