I am dealing with potentially huge CSV files which I want to export from my Rails app, and since it runs on Heroku, my idea was to stream these CSV files directly to S3 when generating them.
Now, I have an issue, in that Aws::S3
expects a file in order to be able to perform an upload, while in my Rails app I would like to do something like:
S3.bucket('my-bucket').object('my-csv') << %w(this is one line)
How can I achieve this?
file_csv = CSV.generate
is to create a string of CSV data in Ruby. After creating this string of CSV, we put to S3 using bucket, with the pathIn my code, I export all the data to an
ActionLog
model.You can use s3 multipart upload that allows upload by splitting large objects to multiple chunks. https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html
Multipart upload requires more complex coding but aws-sdk-ruby V3 supports
upload_stream
method which seems to execute multipart upload internally and it's very easy to use. Maybe exact solution for this use case. https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/S3/Object.html#upload_stream-instance_methodThe argument to the
upload_stream
block can usually be used as an IO object, which allows you to chain and wrap CSV generation as you would for a file or other IO object:Or for example, you could compress the CSV while you generate and upload it, using a tempfile to reduce memory footprint:
I would have a look at http://docs.aws.amazon.com/AWSRubySDK/latest/AWS/S3/S3Object.html#write-instance_method as that might be what you're looking for.
EDIT http://docs.aws.amazon.com/AmazonS3/latest/dev/UploadObjSingleOpRuby.html might be more relevant as the first link points to ruby aws-sdk v1