I normally compress using tar zcvf
and decompress using tar zxvf
(using gzip due to habit).
I've recently gotten a quad core CPU with hyperthreading, so I have 8 logical cores, and I notice that many of the cores are unused during compression/decompression.
Is there any way I can utilize the unused cores to make it faster?
If you want to have more flexibility with filenames and compression options, you can use:
Step 1:
find
find /my/path/ -type f -name "*.sql" -o -name "*.log" -exec
This command will look for the files you want to archive, in this case
/my/path/*.sql
and/my/path/*.log
. Add as many-o -name "pattern"
as you want.-exec
will execute the next command using the results offind
:tar
Step 2:
tar
tar -P --transform='s@/my/path/@@g' -cf - {} +
--transform
is a simple string replacement parameter. It will strip the path of the files from the archive so the tarball's root becomes the current directory when extracting. Note that you can't use-C
option to change directory as you'll lose benefits offind
: all files of the directory would be included.-P
tellstar
to use absolute paths, so it doesn't trigger the warning "Removing leading `/' from member names". Leading '/' with be removed by--transform
anyway.-cf -
tellstar
to use the tarball name we'll specify later{} +
uses everyfiles thatfind
found previouslyStep 3:
pigz
pigz -9 -p 4
Use as many parameters as you want. In this case
-9
is the compression level and-p 4
is the number of cores dedicated to compression. If you run this on a heavy loaded webserver, you probably don't want to use all available cores.Step 4: archive name
> myarchive.tar.gz
Finally.
You can use pigz instead of gzip, which does gzip compression on multiple cores. Instead of using the -z option, you would pipe it through pigz:
By default, pigz uses the number of available cores, or eight if it could not query that. You can ask for more with -p n, e.g. -p 32. pigz has the same options as gzip, so you can request better compression with -9. E.g.
You can use the shortcut
-I
for tar's--use-compress-program
switch, and invokepbzip2
for bzip2 compression on multiple cores:Common approach
There is option for
tar
program:You can use multithread version of archiver or compressor utility.
Most popular multithread archivers are pigz (instead of gzip) and pbzip2 (instead of bzip2). For instance:
Archiver must accept -d. If your replacement utility hasn't this parameter and/or you need specify additional parameters, then use pipes (add parameters if necessary):
Input and output of singlethread and multithread are compatible. You can compress using multithread version and decompress using singlethread version and vice versa.
p7zip
For p7zip for compression you need a small shell script like the following:
Save it as 7zhelper.sh. Here the example of usage:
xz
Regarding multithreaded XZ support. If you are running version 5.2.0 or above of XZ Utils, you can utilize multiple cores for compression by setting
-T
or--threads
to an appropriate value via the environmental variable XZ_DEFAULTS (e.g.XZ_DEFAULTS="-T 0"
).This is a fragment of man for 5.1.0alpha version:
However this will not work for decompression of files that haven't also been compressed with threading enabled. From man for version 5.2.2:
Recompiling with replacement
If you build tar from sources, then you can recompile with parameters
After recompiling tar with these options you can check the output of tar's help:
You can also use the tar flag "--use-compress-program=" to tell tar what compression program to use.
For example use: