How do I read a gzip file line by line?

2019-02-12 16:49发布

I have a gzip file and currently I read it like this:

infile = open("file.log.gz")
gz = Zlib::GzipReader.new(infile)
output = gz.read
puts result

I think this converts the file to a string, but I would like to read it line by line.

What I want to accomplish is that the file has some warning messages with some garbage, I want to grep those warning messages and then write them to another file. But, some warning messages are repeated so I have to make sure that i only grep them once. Hence line by line reading would help me.

3条回答
姐就是有狂的资本
2楼-- · 2019-02-12 17:05

Try this:

infile = open("file.log.gz")
gz = Zlib::GzipReader.new(infile)
while output = gz.gets
  puts output
end
查看更多
贪生不怕死
3楼-- · 2019-02-12 17:17

Other answers show how to read the file line by line, but not how to only capture the errors once. Building on @Tigraine's answer:

require 'set'

infile = open("file.log.gz")
gz = Zlib::GzipReader.new(infile)

errors = Set.new
# or ...
# errors = [].to_set

gz.each_line do |line|
  errors << line if (line[/^Error:/])
  # or ...
  # errors << line if (line['Error:'])
end

puts errors

Set acts like Array, but is built using Hash, so it's like a Hash but we're only concerned with the keys, i.e. only unique values are stored. If you try to add duplicates they will be thrown away, leaving you with only the unique values. You could use an Array, and afterwards use uniq, on it, but a Set will manage it for you up-front.

>> require 'set'
=> true
>> errors = Set.new
=> #<Set: {}>
>> errors << 'a'
=> #<Set: {"a"}>
>> errors << 'b'
=> #<Set: {"a", "b"}>
>> errors << 'a'
=> #<Set: {"a", "b"}>
查看更多
混吃等死
4楼-- · 2019-02-12 17:27

You should be able to simply loop over the gzip reader like you do with regular streams (according to the docs)

infile = open("file.log.gz")
gz = Zlib::GzipReader.new(infile)
gz.each_line do |line|
  puts line
end
查看更多
登录 后发表回答