I have another question on this here Open a CSV file from S3 using Roo on Heroku but I'm not getting any bites - so a reword:
I have a CSV file in an S3 bucket
I want to read it using Roo in a Heroku based app (i.e. no local file access)
How do I open the CSV file from a stream?
Or is there a better tool for doing this?
I am using Rails 4, Ruby 2. Note I can successfuly open the CSV for reading if I post it from a form. How can I adapt this to snap the file from an S3 bucket?
Short answer - don't use Roo.
I ended up using the standard CSV commands, working with small CSV files you can very simply read the file contents into memory using something like this:
body = file.read
CSV.parse(body, col_sep: ",", headers: true) do |row|
row_hash = row.to_hash
field = row_hash["FieldName"]
reading a file passed in from a form, just reference the params:
file = params[:file]
body = file.read
To read in form S3 you can use the AWS gem:
s3 = AWS::S3.new(access_key_id: ENV['AWS_ACCESS_KEY_ID'], secret_access_key: ENV['AWS_SECRET_ACCESS_KEY'])
bucket = s3.buckets['BUCKET_NAME']
# check each object in the bucket
bucket.objects.each do |obj|
import_file = obj.key
body = obj.read
# call the same style import code as above...
end
I put some code together based on this:
Make Remote Files Local With Ruby Tempfile
and Roo seems to work OK when handed a temp file. I couldn't get it to work with S3 directly. I don't particularly like the copy approach, but my processing runs on delayed job, and I want to keep the Roo features a little more than I dislike the file copy. Plain CSV files work without fishing out the encoding info, but XLS files would not.