I'm new to Hbase, and I found that Hbase will write all the operations to WAL and memstore.
Q1: I wonder why Hbase need WAL?
Q2 : Hbase must write to WAL every time when I put or delete data, why don't operate it just in its data file?
I'm new to Hbase, and I found that Hbase will write all the operations to WAL and memstore.
Q1: I wonder why Hbase need WAL?
Q2 : Hbase must write to WAL every time when I put or delete data, why don't operate it just in its data file?
We can recover the edits from WAL if RegionServer crashesh, Without WAL there is the possibility of data loss in the case of a RegionServer failure before each MemStore is flushed and new StoreFiles are written. You can find more info here
WAL
is for recovery purpose. lets understand hbase architecture in a close way by MapR docs.When the client issues a Put request, the first step is to write the data to the write-ahead log, the WAL:
Once the data is written to the WAL, it is placed in the MemStore. Then, the put request acknowledgement returns to the client.
If
WAL
is enabled.. YesIf
WAL
is disabled it can operate on the files directly by removing additional overhead of writing in toWAL
.NOTE:
General cases
WAL
will be disabled for mutation(Row-level mutations)/write performance purpose. Underlying caveat if you do so is, dont wont be recoverable... means data loss. Also if you are using SOLR, which will work onWAL
and hence SOLR documents wont be updated. if dont have the case you can go ahead with disablingWAL
Further reading see my answer here
HBase has is its
own ACID semantics
: http://hbase.apache.org/acid-semantics.htmlIt needs a WAL so that it can replay edits in case of Failure of a RegionServer. WAL plays an important to provide durability guarantee.
WAL is optional. You can disable WAL during HBase writes. If its disabled, you will see some performance improvements. However, there might be some cluster failure/disaster scenarios where you can loose some data. So, its a trade-off that depends on your use case.