Shell Script : Join small gzipped files into large

2019-08-03 15:52发布

问题:

I have some 8000 gz files of around 60MB each. I want to merge them into few larger files. So how to do it in a bash script without unzipping them ?

Shell script may take input as new-file-size or number of files to combine.

For example say I have 1.gz, 2.gz, 3.gz ... 10.gz Now I need one file per say 3 files, so now 1.gz, 2.gz and 3.gz will combine into 1_new.gz and so on.

回答1:

It is possible to concatinate gziped files together, however when you gunzip the resulting file you'll get a single stream, see the gzip manual for reference.

A script would be similar to Ansgar Wiechers's one for tar:

#!/bin/bash

maxnum=$1
i=1
j=0
for f in *.gz; do
   cat $f >> archive_$j.gz
   i=$((i+1))
   if [ $i -eq $maxnum ]; then
      i=1
      j=$((j+1))
   fi
done

Note that the above code is untested.

If you want to archive things properly tar is a better solution, but if all you want to do is concatinate a number of files which have been gziped then such a concatination as this is good.



回答2:

gzip can only compress single files. You need tar to combine multiple files into a single archive, which in can then (optionally) be compressed with gzip. If you just want to merge the compressed files, you could use something like this:

maxnum=$1
i=1
j=0
for f in *.gz; do
  tar rf archive_$j.tar $f
  if [ $i -eq $maxnum ]; then
    i=1
    j=$((j+1))
  fi
done

That will produce a uncompressed tar files containing the compressed source files.

If you want to produce compressed tar files containing the uncompressed source files, the above won't work, though, because you can't update compressed tar files. You'll need to uncompress the source files first and then create a compressed tar file from them:

maxnum=$1
i=1
j=0
flist=
for f in *.gz; do
  gunzip $f
  flist="$flist $(basename $f .gz)"
  if [ $i -eq $maxnum ]; then
    tar czf archive_$j.tar.gz --remove-files $flist
    i=1
    j=$((j+1))
    flist=
  fi
done


标签: bash shell