How do I read a gzip file line by line?

2019-02-12 16:21发布

问题:

I have a gzip file and currently I read it like this:

infile = open("file.log.gz")
gz = Zlib::GzipReader.new(infile)
output = gz.read
puts result

I think this converts the file to a string, but I would like to read it line by line.

What I want to accomplish is that the file has some warning messages with some garbage, I want to grep those warning messages and then write them to another file. But, some warning messages are repeated so I have to make sure that i only grep them once. Hence line by line reading would help me.

回答1:

You should be able to simply loop over the gzip reader like you do with regular streams (according to the docs)

infile = open("file.log.gz")
gz = Zlib::GzipReader.new(infile)
gz.each_line do |line|
  puts line
end


回答2:

Try this:

infile = open("file.log.gz")
gz = Zlib::GzipReader.new(infile)
while output = gz.gets
  puts output
end


回答3:

Other answers show how to read the file line by line, but not how to only capture the errors once. Building on @Tigraine's answer:

require 'set'

infile = open("file.log.gz")
gz = Zlib::GzipReader.new(infile)

errors = Set.new
# or ...
# errors = [].to_set

gz.each_line do |line|
  errors << line if (line[/^Error:/])
  # or ...
  # errors << line if (line['Error:'])
end

puts errors

Set acts like Array, but is built using Hash, so it's like a Hash but we're only concerned with the keys, i.e. only unique values are stored. If you try to add duplicates they will be thrown away, leaving you with only the unique values. You could use an Array, and afterwards use uniq, on it, but a Set will manage it for you up-front.

>> require 'set'
=> true
>> errors = Set.new
=> #<Set: {}>
>> errors << 'a'
=> #<Set: {"a"}>
>> errors << 'b'
=> #<Set: {"a", "b"}>
>> errors << 'a'
=> #<Set: {"a", "b"}>