Why Hbase need WAL?

2019-08-10 16:23发布

I'm new to Hbase, and I found that Hbase will write all the operations to WAL and memstore.

Q1: I wonder why Hbase need WAL?

Q2 : Hbase must write to WAL every time when I put or delete data, why don't operate it just in its data file?

标签: hbase wal
3条回答
老娘就宠你
2楼-- · 2019-08-10 16:35

We can recover the edits from WAL if RegionServer crashesh, Without WAL there is the possibility of data loss in the case of a RegionServer failure before each MemStore is flushed and new StoreFiles are written. You can find more info here

查看更多
ら.Afraid
3楼-- · 2019-08-10 16:36

Q1) why Hbase need WAL?

WAL is for recovery purpose. lets understand hbase architecture in a close way by MapR docs.

When the client issues a Put request, the first step is to write the data to the write-ahead log, the WAL:

  • Edits are appended to the end of the WAL file that is stored on disk.
  • The WAL is used to recover not-yet-persisted data in case a server crashes.

enter image description here

Once the data is written to the WAL, it is placed in the MemStore. Then, the put request acknowledgement returns to the client.

enter image description here

Q2) Hbase must write to WAL every time when I put or delete data, why don't operate it just in its data file?

If WAL is enabled.. Yes

If WAL is disabled it can operate on the files directly by removing additional overhead of writing in to WAL.

NOTE:

General cases WAL will be disabled for mutation(Row-level mutations)/write performance purpose. Underlying caveat if you do so is, dont wont be recoverable... means data loss. Also if you are using SOLR, which will work on WAL and hence SOLR documents wont be updated. if dont have the case you can go ahead with disabling WAL

Further reading see my answer here

查看更多
姐就是有狂的资本
4楼-- · 2019-08-10 16:56

HBase has is its own ACID semantics: http://hbase.apache.org/acid-semantics.html

It needs a WAL so that it can replay edits in case of Failure of a RegionServer. WAL plays an important to provide durability guarantee.

WAL is optional. You can disable WAL during HBase writes. If its disabled, you will see some performance improvements. However, there might be some cluster failure/disaster scenarios where you can loose some data. So, its a trade-off that depends on your use case.

查看更多
登录 后发表回答