find string inside a gzipped file in a folder

2020-02-08 03:53发布

问题:

My current problem is that I have around 10 folders, which contain gzipped files (around on an average 5 each). This makes it 50 files to open and look at.

Is there a simpler method to find out if a gzipped file inside a folder has a particular pattern or not?

zcat ABC/myzippedfile1.txt.gz | grep "pattern match"
zcat ABC/myzippedfile2.txt.gz | grep "pattern match"

Instead of writing a script, can I do the same in a single line, for all the folders and sub folders?

for f in `ls *.gz`; do echo $f; zcat $f | grep <pattern>; done;

回答1:

zgrep will look in gzipped files, has a -R recursive option, and a -H show me the filename option:

zgrep -R --include=*.gz -H "pattern match" .


回答2:

You don't need zcat here because there is zgrep and zegrep.

If you want to run a command over a directory hierarchy, you use find:

find . -name "*.gz" -exec zgrep ⟨pattern⟩ \{\} \;

And also “ls *.gz” is useless in for and you should just use “*.gz” in the future.



回答3:

use the find command

find . -name "*.gz" -exec zcat "{}" + |grep "test"

or try using the recursive option (-r) of zcat



回答4:

how zgrep don't support -R

I think the solution of "Nietzche-jou" could be a better answer, but I would add the option -H to show the file name something like this

find . -name "*.gz" -exec zgrep -H 'PATTERN' \{\} \;


回答5:

Coming in a bit late on this, had a similar problem and was able to resolve using;

zcat -r /some/dir/here | grep "blah"

As detailed here;

http://manpages.ubuntu.com/manpages/quantal/man1/gzip.1.html

However, this does not show the original file that the result matched from, instead showing "(standard input)" as it's coming in from a pipe. zcat does not seem to support outputting a name either.

In terms of performance, this is what we got;

$ alias dropcache="sync && echo 3 > /proc/sys/vm/drop_caches"

$ find 09/01 | wc -l
4208

$ du -chs 09/01
24M

$ dropcache; time zcat -r 09/01 > /dev/null
real    0m3.561s

$ dropcache; time find 09/01 -iname '*.txt.gz' -exec zcat '{}' \; > /dev/null
0m38.041s

As you can see, using the find|zcat method is significantly slower than using zcat -r when dealing with even a small volume of files. I was also unable to make zcat output the file name (using -v will apparently output the filename, but not on every single line). It would appear that there isn't currently a tool that will provide both speed and name consistency with grep (i.e. the -H option).

If you need to identify the name of the file that the result belongs to, then you'll need to either write your own tool (could be done in 50 lines of Python code) or use the slower method. If you do not need to identify the name, then use zcat -r.

Hope this helps



回答6:

find . -name "*.gz"|xargs zcat | grep "pattern" should do.



回答7:

zgrep "string" ./*/*

You can use above command to search for string in .gz files of dir directory where dir has following sub-directories structure:

/dir
    /childDir1
              /file1.gz
              /file2.gz
    /childDir2
              /file3.gz
              /file4.gz
    /childDir3
              /file5.gz
              /file6.gz