How to read a file from bottom to top in Ruby?

2019-01-23 05:25发布

问题:

I've been working on a log viewer for a Rails app and have found that I need to read around 200 lines of a log file from bottom to top instead of the default top to bottom.

Log files can get quite large, so I've already tried and ruled out the IO.readlines("log_file.log")[-200..-1] method.

Are there any other ways to go about reading a file backwards in Ruby without the need for a plugin or gem?

回答1:

The only correct way to do this that also works on enormous files is to read n bytes at a time from the end until you have the number of lines that you want. This is essentially how Unix tail works.

An example implementation of IO#tail(n), which returns the last n lines as an Array:

class IO
  TAIL_BUF_LENGTH = 1 << 16

  def tail(n)
    return [] if n < 1

    seek -TAIL_BUF_LENGTH, SEEK_END

    buf = ""
    while buf.count("\n") <= n
      buf = read(TAIL_BUF_LENGTH) + buf
      seek 2 * -TAIL_BUF_LENGTH, SEEK_CUR
    end

    buf.split("\n")[-n..-1]
  end
end

The implementation is a little naive, but a quick benchmark shows what a ridiculous difference this simple implementation can already make (tested with a ~25MB file generated with yes > yes.txt):

                            user     system      total        real
f.readlines[-200..-1]   7.150000   1.150000   8.300000 (  8.297671)
f.tail(200)             0.000000   0.000000   0.000000 (  0.000367)

The benchmark code:

require "benchmark"

FILE = "yes.txt"

Benchmark.bmbm do |b|
  b.report "f.readlines[-200..-1]" do
    File.open(FILE) do |f|
      f.readlines[-200..-1]
    end
  end

  b.report "f.tail(200)" do
    File.open(FILE) do |f|
      f.tail(200)
    end
  end
end

Of course, other implementations already exist. I haven't tried any, so I cannot tell you which is best.



回答2:

There's a module Elif available (a port of Perl's File::ReadBackwards) which does efficient line-by-line backwards reading of files.



回答3:

Since I'm too new to comment on molf awesome answer I have to post it as a separate answer. I needed this feature to read log files while they're written , and the last portion of the logs contain the string I need to know it's done and I can start parsing it.

Hence handling small sized files is crucial for me (I might ping the log while it's tiny). So I enhanced molf code:

class IO
    def tail(n)
        return [] if n < 1
        if File.size(self) < ( 1 << 16 ) 
            tail_buf_length = File.size(self)
            return self.readlines.reverse[0..n-1]
        else 
            tail_buf_length = 1 << 16
        end
        self.seek(-tail_buf_length,IO::SEEK_END)
        out   = ""
        count = 0
        while count <= n
            buf     =  self.read( tail_buf_length )
            count   += buf.count("\n")
            out     += buf
            # 2 * since the pointer is a the end , of the previous iteration
            self.seek(2 * -tail_buf_length,IO::SEEK_CUR)
        end
        return out.split("\n")[-n..-1]
    end
end