Reading files in a zip archive, without unzipping

2020-07-11 05:18发布

问题:

I have a directory with 100+ zip files and I need to read the files inside the zip files to do some data processing, without unzipping the archive.

Is there a Ruby library to read the contents of files in zip archives, without unzipping the file?

Using rubyzip gives an error:

require 'zip'

Zip::File.open('my_zip.zip') do |zip_file|
  # Handle entries one by one
  zip_file.each do |entry|
    # Extract to file/directory/symlink
    puts "Extracting #{entry.name}"
    entry.extract('here')

    # Read into memory
    content = entry.get_input_stream.read
  end
end 

Gives this error:

test.rb:12:in `block (2 levels) in <main>': undefined method `read' for Zip::NullInputStream:Module (NoMethodError)
    from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/entry_set.rb:42:in `call'
    from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/entry_set.rb:42:in `block in each'
    from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/entry_set.rb:41:in `each'
    from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/entry_set.rb:41:in `each'
    from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/central_directory.rb:182:in `each'
    from test.rb:6:in `block in <main>'
    from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/file.rb:99:in `open'
    from test.rb:4:in `<main>'

回答1:

The Zip::NullInputStream is returned if the entry is a directory and not a file, could that be the case?

Here's a more robust variation of the code:

#!/usr/bin/env ruby

require 'rubygems'
require 'zip'


Zip::File.open('my_zip.zip') do |zip_file|
  # Handle entries one by one
  zip_file.each do |entry|
    if entry.directory?
      puts "#{entry.name} is a folder!"
    elsif entry.symlink?
      puts "#{entry.name} is a symlink!"
    elsif entry.file?
      puts "#{entry.name} is a regular file!"

      # Read into memory
      content = entry.get_input_stream.read

      # Output
      puts content
    else
      puts "#{entry.name} is something unknown, oops!"
    end
  end
end


回答2:

I came across the same issue and checking for if entry.file?, before entry.get_input_stream.read, resolved the issue.

require 'zip'

Zip::File.open('my_zip.zip') do |zip_file|
  # Handle entries one by one
  zip_file.each do |entry|
    # Extract to file/directory/symlink
    puts "Extracting #{entry.name}"
    entry.extract('here')

    # Read into memory
    if entry.file?
      content = entry.get_input_stream.read
    end
  end
end 


标签: ruby