How can I tail a zipped file without reading its e

I want to emulate the functionality of gzcat | tail -n.

This would be helpful for times when there are huge files (of a few GB's or so). Can I tail the last few lines of such a file w/o reading it from the beginning? I doubt that this won't be possible since I'd guess for gzip, the encoding would depend on all the previous text.

But still I'd like to hear if anyone has tried doing something similar - maybe investigating over a compression algorithm that could provide such a feature.

标签： algorithm compression

6条回答

forever°为你锁心

2楼-- · 2019-01-23 11:06

If it's an option, then bzip2 might be a better compression algorithm to use for this purpose.

Bzip2 uses a block compression scheme. As such, if you take a chunk of the end of your file which you are sure is large enough to contain all of the last chunk, then you can recover it with bzip2recover.

The block size is selectable at the time the file is written. In fact that's what happens when you set -1 (or --fast) to -9 (or --best) as compression options, which correspond to block sizes of 100k to 900k. The default is 900k.

The bzip2 command line tools don't give you a nice friendly way to do this with a pipeline, but then given bzip2 is not stream oriented, perhaps that's not surprising.

0人赞添加讨论(0) 举报

神经病院院长

3楼-- · 2019-01-23 11:09

An example of a fully gzip-compatible pseudo-random access format is dictzip:

For compression, the file is divided up into "chunks" of data, each chunk is less than 64kB. [...]

To perform random access on the data, the offset and length of the data are provided to library routines. These routines determine the chunk in which the desired data begins, and decompresses that chunk. Consecutive chunks are decompressed as necessary."

0人赞添加讨论(0) 举报

闹够了就滚

4楼-- · 2019-01-23 11:13

zindex creates and queries an index on a compressed, line-based text file in a time- and space-efficient way.

https://github.com/mattgodbolt/zindex

0人赞添加讨论(0) 举报

对你真心纯属浪费

5楼-- · 2019-01-23 11:24

No, you can't. The zipping algorithm works on streams and adapts its internal codings to what the stream contains to achieve its high compression ratio.

Without knowing what the contents of the stream are before a certain point, it's impossible to know how to go about de-compressing from that point on.

Any algorithm which allows you to de-compress arbitrary parts of it will require multiple passes over the data to compress it.

0人赞添加讨论(0) 举报

做自己的国王

6楼-- · 2019-01-23 11:25

If you have control over what goes into the file in the first place, if it's anything like a ZIP file you could store chunks of predetermined size with filenames in increasing numerical order and then just decompress the last chunk/file.

0人赞添加讨论(0) 举报

可以哭但决不认输i

7楼-- · 2019-01-23 11:26

BGZF is used to created index gzip compressed BAM files created by Samtools. These are randomly accessible.

http://samtools.sourceforge.net/

0人赞添加讨论(0) 举报

How can I tail a zipped file without reading its e

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间