I got a sparse file of 1TB which stores actually 32MB data on Linux.
Is it possible to "efficiently" make a package to store the sparse file? The package should be unpacked to be a 1TB sparse file on another computer. Ideally, the "package" should be around 32MB.
Note: On possible solution is to use 'tar': https://wiki.archlinux.org/index.php/Sparse_file#Archiving_with_.60tar.27
However, for a 1TB sparse file, although the tar ball may be small, archiving the sparse file will take too long a time.
Edit 1
I tested the tar and gzip and the results are as follows (Note that this sparse file contains data of 0 byte).
$ du -hs sparse-1
0 sparse-1
$ ls -lha sparse-1
-rw-rw-r-- 1 user1 user1 1.0T 2012-11-03 11:17 sparse-1
$ time tar cSf sparse-1.tar sparse-1
real 96m19.847s
user 22m3.314s
sys 52m32.272s
$ time gzip sparse-1
real 200m18.714s
user 164m33.835s
sys 10m39.971s
$ ls -lha sparse-1*
-rw-rw-r-- 1 user1 user1 1018M 2012-11-03 11:17 sparse-1.gz
-rw-rw-r-- 1 user1 user1 10K 2012-11-06 23:13 sparse-1.tar
The 1TB file sparse-1 which contains 0 byte data can be archived by 'tar' to a 10KB tar ball or compressed by gzip to a ~1GB file. gzip takes around 2 times of the time than the time tar uses.
From the comparison, 'tar' seems better than gzip.
However, 96 minutes are too long for a sparse file that contains data of 0 byte.
Edit 2
rsync
seems finish copying the file in more time than tar
but less than gzip
:
$ time rsync --sparse sparse-1 sparse-1-copy
real 124m46.321s
user 107m15.084s
sys 83m8.323s
$ du -hs sparse-1-copy
4.0K sparse-1-copy
Hence, tar
+ cp
or scp
should be faster than directly rsync
for this extremely sparse file.
Edit 3
Thanks to @mvp for pointing out the SEEK_HOLE functionality in newer kernel. (I previously work on a 2.6.32 Linux kernel).
Note: bsdtar version >=3.0.4 is required (check here: http://ask.fclose.com/4/how-to-efficiently-archive-a-very-large-sparse-file?show=299#c299 ).
On a newer kernel and Fedora release (17), tar
and cp
handles the sparse file very efficiently.
[zma@office tmp]$ ls -lh pmem-1
-rw-rw-r-- 1 zma zma 1.0T Nov 7 20:14 pmem-1
[zma@office tmp]$ time tar cSf pmem-1.tar pmem-1
real 0m0.003s
user 0m0.003s
sys 0m0.000s
[zma@office tmp]$ time cp pmem-1 pmem-1-copy
real 0m0.020s
user 0m0.000s
sys 0m0.003s
[zma@office tmp]$ ls -lh pmem*
-rw-rw-r-- 1 zma zma 1.0T Nov 7 20:14 pmem-1
-rw-rw-r-- 1 zma zma 1.0T Nov 7 20:15 pmem-1-copy
-rw-rw-r-- 1 zma zma 10K Nov 7 20:15 pmem-1.tar
[zma@office tmp]$ mkdir t
[zma@office tmp]$ cd t
[zma@office t]$ time tar xSf ../pmem-1.tar
real 0m0.003s
user 0m0.000s
sys 0m0.002s
[zma@office t]$ ls -lha
total 8.0K
drwxrwxr-x 2 zma zma 4.0K Nov 7 20:16 .
drwxrwxrwt. 35 root root 4.0K Nov 7 20:16 ..
-rw-rw-r-- 1 zma zma 1.0T Nov 7 20:14 pmem-1
I am using a 3.6.5 kernel:
[zma@office t]$ uname -a
Linux office.zhiqiangma.com 3.6.5-1.fc17.x86_64 #1 SMP Wed Oct 31 19:37:18 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Short answer: Use
bsdtar
to create archives, and GNUtar
to extract them on another box.Long answer: There are some requirements for this to work.
First, Linux must be at least kernel 3.1 (Ubuntu 12.04 or later would do), so it supports SEEK_HOLE functionality.
Then, you need tar utility that can support this syscall. At the moment, GNU
tar
does not support it, butbsdtar
does - install it usingsudo apt-get install bsdtar
.While
bsdtar
(which useslibarchive
) is awesome, unfortunately, it is not very smart when it comes to untarring - it stupidly requires to have at least as much free space on target drive as untarred file size, without regard to holes. GNUtar
will untar such sparse archives efficiently and will not check this condition.This is log from Ubuntu 12.10 (Linux kernel 3.5):
Like I said above, unfortunately, untarring with
bsdtar
will not work unless you have 1TB free space. However, GNU tar works just fine to untar suchsparse.tar
:From a related question, maybe
rsync
will work:You're definitely looking for a compression tool such as
tar
,lzma
,bzip2
,zip
orrar
. According to this site,lzma
is quite fast while still having quite a good compression ratio:http://blog.terzza.com/linux-compression-comparison-gzip-vs-bzip2-vs-lzma-vs-zip-vs-compress/
You can also adjust the speed/quality ratio of the compression by setting the compression level to something low, experiment a bit to find a level that works best
http://linux.die.net/man/1/unlzma