Ruby streaming tar/gz

2019-04-09 12:34发布

Basically I want to stream data from memory into a tar/gz format (possibly multiple files into the tar, but it should NEVER TOUCH THE HARDDRIVE, only streaming!), then stream them somewhere else (an HTTP request body in my case).

Anyone know of an existing library that can do this? Is there something in Rails?

libarchive-ruby is only a C wrapper and seems like it would be very platform-dependent (the docs want you to compile as an installation step?!).

SOLUTION:

require 'zlib'
require 'rubygems/package'

tar = StringIO.new

Gem::Package::TarWriter.new(tar) { |writer|
  writer.add_file("a_file.txt", 0644) { |f| 
    (1..1000).each { |i| 
      f.write("some text\n")
    }
  }
  writer.add_file("another_file.txt", 0644) { |f| 
    f.write("some more text\n")
  }
}
tar.seek(0)

gz = Zlib::GzipWriter.new(File.new('this_is_a_tar_gz.tar.gz', 'wb'))  # Make sure you use 'wb' for binary write!
gz.write(tar.read)
tar.close
gz.close

That's it! You can swap out the File in the GzipWriter with any IO to keep it streaming. Cookies for dw11wtq!

标签: ruby stream tar gz
2条回答
Ridiculous、
2楼-- · 2019-04-09 12:54

Based on the solution OP wrote, I wrote fully on-memory tgz archive function what I want to use to POST to web server.

  # Create tar gz archive file from files, on the memory.
  # Parameters:
  #   files: Array of hash with key "filename" and "body"
  #     Ex: [{"filename": "foo.txt", "body": "This is foo.txt"},...]
  #
  # Return:: tar_gz archived image as string
  def create_tgz_archive_from_files(files)
    tar = StringIO.new
    Gem::Package::TarWriter.new(tar){ |tar_writer|
      files.each{|file|
        tar_writer.add_file(file['filename'], 0644){|f|
          f.write(file['body'])
        }
      }
    }
    tar.rewind

    gz = StringIO.new('', 'r+b')
    gz.set_encoding("BINARY")
    gz_writer = Zlib::GzipWriter.new(gz)
    gz_writer.write(tar.read)
    tar.close
    gz_writer.finish
    gz.rewind
    tar_gz_buf = gz.read
    return tar_gz_buf
  end
查看更多
狗以群分
3楼-- · 2019-04-09 12:57

Take a look at the TarWriter class in rubygems: http://rubygems.rubyforge.org/rubygems-update/Gem/Package/TarWriter.html it just operates on an IO stream, which may be a StringIO.

tar = StringIO.new

Gem::Package::TarWriter.new(tar) do |writer|
  writer.add_file("hello_world.txt", 0644) { |f| f.write("Hello world!\n") }
end

tar.seek(0)

p tar.read #=> mostly padding, but a tar nonetheless

It also provides methods to add directories if you need a directory layout in the tarball.

For reference, you could achieve the gzipping with IO.popen, just piping the data in/out of the system process:

http://www.ruby-doc.org/core-1.9.2/IO.html#method-c-popen

The gzipping itself would look something like this:

gzippped_data = IO.popen("gzip", "w+") do |gzip|
  gzip.puts "Hello world!"
  gzip.close_write
  gzip.read
end
# => "\u001F\x8B\b\u0000\xFD\u001D\xA2N\u0000\u0003\xF3H\xCD\xC9\xC9W(\xCF/\xCAIQ\xE4\u0002\u0000A䩲\r\u0000\u0000\u0000"
查看更多
登录 后发表回答