Basically I want to stream data from memory into a tar/gz format (possibly multiple files into the tar, but it should NEVER TOUCH THE HARDDRIVE, only streaming!), then stream them somewhere else (an HTTP request body in my case).
Anyone know of an existing library that can do this? Is there something in Rails?
libarchive-ruby is only a C wrapper and seems like it would be very platform-dependent (the docs want you to compile as an installation step?!).
SOLUTION:
require 'zlib'
require 'rubygems/package'
tar = StringIO.new
Gem::Package::TarWriter.new(tar) { |writer|
writer.add_file("a_file.txt", 0644) { |f|
(1..1000).each { |i|
f.write("some text\n")
}
}
writer.add_file("another_file.txt", 0644) { |f|
f.write("some more text\n")
}
}
tar.seek(0)
gz = Zlib::GzipWriter.new(File.new('this_is_a_tar_gz.tar.gz', 'wb')) # Make sure you use 'wb' for binary write!
gz.write(tar.read)
tar.close
gz.close
That's it! You can swap out the File in the GzipWriter with any IO to keep it streaming. Cookies for dw11wtq!
Take a look at the TarWriter class in rubygems: http://rubygems.rubyforge.org/rubygems-update/Gem/Package/TarWriter.html it just operates on an IO stream, which may be a StringIO.
tar = StringIO.new
Gem::Package::TarWriter.new(tar) do |writer|
writer.add_file("hello_world.txt", 0644) { |f| f.write("Hello world!\n") }
end
tar.seek(0)
p tar.read #=> mostly padding, but a tar nonetheless
It also provides methods to add directories if you need a directory layout in the tarball.
For reference, you could achieve the gzipping with IO.popen
, just piping the data in/out of the system process:
http://www.ruby-doc.org/core-1.9.2/IO.html#method-c-popen
The gzipping itself would look something like this:
gzippped_data = IO.popen("gzip", "w+") do |gzip|
gzip.puts "Hello world!"
gzip.close_write
gzip.read
end
# => "\u001F\x8B\b\u0000\xFD\u001D\xA2N\u0000\u0003\xF3H\xCD\xC9\xC9W(\xCF/\xCAIQ\xE4\u0002\u0000A䩲\r\u0000\u0000\u0000"
Based on the solution OP wrote, I wrote fully on-memory tgz archive function what I want to use to POST to web server.
# Create tar gz archive file from files, on the memory.
# Parameters:
# files: Array of hash with key "filename" and "body"
# Ex: [{"filename": "foo.txt", "body": "This is foo.txt"},...]
#
# Return:: tar_gz archived image as string
def create_tgz_archive_from_files(files)
tar = StringIO.new
Gem::Package::TarWriter.new(tar){ |tar_writer|
files.each{|file|
tar_writer.add_file(file['filename'], 0644){|f|
f.write(file['body'])
}
}
}
tar.rewind
gz = StringIO.new('', 'r+b')
gz.set_encoding("BINARY")
gz_writer = Zlib::GzipWriter.new(gz)
gz_writer.write(tar.read)
tar.close
gz_writer.finish
gz.rewind
tar_gz_buf = gz.read
return tar_gz_buf
end