I got a sparse file of 1TB which stores actually 32MB data on Linux.
Is it possible to "efficiently" make a package to store the sparse file? The package should be unpacked to be a 1TB sparse file on another computer. Ideally, the "package" should be around 32MB.
Note: On possible solution is to use 'tar': https://wiki.archlinux.org/index.php/Sparse_file#Archiving_with_.60tar.27
However, for a 1TB sparse file, although the tar ball may be small, archiving the sparse file will take too long a time.
Edit 1
I tested the tar and gzip and the results are as follows (Note that this sparse file contains data of 0 byte).
$ du -hs sparse-1
0 sparse-1
$ ls -lha sparse-1
-rw-rw-r-- 1 user1 user1 1.0T 2012-11-03 11:17 sparse-1
$ time tar cSf sparse-1.tar sparse-1
real 96m19.847s
user 22m3.314s
sys 52m32.272s
$ time gzip sparse-1
real 200m18.714s
user 164m33.835s
sys 10m39.971s
$ ls -lha sparse-1*
-rw-rw-r-- 1 user1 user1 1018M 2012-11-03 11:17 sparse-1.gz
-rw-rw-r-- 1 user1 user1 10K 2012-11-06 23:13 sparse-1.tar
The 1TB file sparse-1 which contains 0 byte data can be archived by 'tar' to a 10KB tar ball or compressed by gzip to a ~1GB file. gzip takes around 2 times of the time than the time tar uses.
From the comparison, 'tar' seems better than gzip.
However, 96 minutes are too long for a sparse file that contains data of 0 byte.
Edit 2
rsync
seems finish copying the file in more time than tar
but less than gzip
:
$ time rsync --sparse sparse-1 sparse-1-copy
real 124m46.321s
user 107m15.084s
sys 83m8.323s
$ du -hs sparse-1-copy
4.0K sparse-1-copy
Hence, tar
+ cp
or scp
should be faster than directly rsync
for this extremely sparse file.
Edit 3
Thanks to @mvp for pointing out the SEEK_HOLE functionality in newer kernel. (I previously work on a 2.6.32 Linux kernel).
Note: bsdtar version >=3.0.4 is required (check here: http://ask.fclose.com/4/how-to-efficiently-archive-a-very-large-sparse-file?show=299#c299 ).
On a newer kernel and Fedora release (17), tar
and cp
handles the sparse file very efficiently.
[zma@office tmp]$ ls -lh pmem-1
-rw-rw-r-- 1 zma zma 1.0T Nov 7 20:14 pmem-1
[zma@office tmp]$ time tar cSf pmem-1.tar pmem-1
real 0m0.003s
user 0m0.003s
sys 0m0.000s
[zma@office tmp]$ time cp pmem-1 pmem-1-copy
real 0m0.020s
user 0m0.000s
sys 0m0.003s
[zma@office tmp]$ ls -lh pmem*
-rw-rw-r-- 1 zma zma 1.0T Nov 7 20:14 pmem-1
-rw-rw-r-- 1 zma zma 1.0T Nov 7 20:15 pmem-1-copy
-rw-rw-r-- 1 zma zma 10K Nov 7 20:15 pmem-1.tar
[zma@office tmp]$ mkdir t
[zma@office tmp]$ cd t
[zma@office t]$ time tar xSf ../pmem-1.tar
real 0m0.003s
user 0m0.000s
sys 0m0.002s
[zma@office t]$ ls -lha
total 8.0K
drwxrwxr-x 2 zma zma 4.0K Nov 7 20:16 .
drwxrwxrwt. 35 root root 4.0K Nov 7 20:16 ..
-rw-rw-r-- 1 zma zma 1.0T Nov 7 20:14 pmem-1
I am using a 3.6.5 kernel:
[zma@office t]$ uname -a
Linux office.zhiqiangma.com 3.6.5-1.fc17.x86_64 #1 SMP Wed Oct 31 19:37:18 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux