Import records from CSV in small chunks (ruby on r

2019-06-04 17:35发布

问题:

I need to import a large CSV file, broken down to small chunks that will be imported every X hours.

I made the following rake task

task :import_reviews => :environment do
 require 'csv'
 CSV.foreach('reviews.csv', :headers => true) do |row|
  Review.create(row.to_hash)
 end
end

Using heroku scheduler I could let this task run every day, but I want to break it up in several chunks, for example 100 records every day:

That means I need to keep track of the last row imported, and start with that row += 1 the next time I would let the rake task run, how can I implement this?

Thanks in advance!

回答1:

Read the rest of the CSV in to an array and outside the CSV.foreach loop write to the same CSV file, so that it gets smaller each time. I suppose i don't have to give this in code but if necessary comment me and i'll do.

If you want to keep the CSV in a whole, add a field "pocessed" to the CSV and fill it with a 1 if read, next time filter these out.

EDIT: this isn't tested and sure could be better but just to show what i mean

require 'csv'
index = 1
csv_out = CSV::Writer.generate(File.open('new.csv', 'wb'))
CSV.foreach('reviews.csv', :headers => true) do |row|
  if index < 101
    Review.create(row.to_hash)
  else
    csv_out << row
  end
  index += 1
end
csv_out.close

afterward, dump reviews.csv and rename new.csv to reviews.csv



回答2:

you might want to do something like this for the chunked CSV parsing, and then enqueue the jobs which hit the database with Resque and schedule them in an appropriate way, so they run throttled:

https://gist.github.com/3101950