I need to backup HBase table before update to a newer version. I decided to export table to hdfs with standard Export tool and then move it to local file system. For some reason exported table is 4 times larger than original one:
hdfs dfs -du -h
1.4T backup-my-table
hdfs dfs -du -h /hbase/data/default/
417G my-table
What can be the reason? Is it somehow related to compression?
P.S. Maybe the way I made the backup matters. First I made a snapshot from target table, then cloned it to a copy table, then deleted unnecessary column families from this copied table (so I expected the result size to be 2 times smaller), then I run export tool on this copy table.
upd for future visitors: here's the correct command to export table with compression
./hbase org.apache.hadoop.hbase.mapreduce.Export \
-Dmapreduce.output.fileoutputformat.compress=true \
-Dmapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec \
-Dmapreduce.output.fileoutputformat.compress.type=BLOCK \
-Dhbase.client.scanner.caching=200 \
table-to-export export-dir
May be you compressed using
SNAPPY
or some other compression technique. like thisCompression support Check
Use
CompressionTest
to verify snappy support is enabled and the libs can be loaded ON ALL NODES of your cluster:Export Command Source to apply compression:
If you dig deep to understand Export command (source), then you will find
see below properties which could reduce size drastically..
Also see
getExportFilter
which might be useful in your case to narrow your export.