Possible memory issue crashing Hbase Thrift Server

2019-05-11 20:03发布

问题:

I'm running Cloudera CDH4 with Hbase and Hbase Thrift Server. Several times a day, the Thrift Server crashes.

In /var/log/hbase/hbase-hbase-thrift-myserver.out, there is this:

#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
#   Executing /bin/sh -c "kill -9 8151"...

In /var/log/hbase/hbase-hbase-thrift-myserver.log, there are no error messages at the end of the file. There are only a lot of DEBUG messages stating that one of the nodes is caching a particular file.

I can't figure out any configuration options for the Hbase Thrift Server. There are no obvious files in /etc/. Just /etc/hbase/conf and its Hbase files.

Any ideas on debugging?

回答1:

We had this exact same problem with our HBase Thrift setup, and ended up using a watchdog script that restarts Thrift if its not running.

Are you hitting your HBase server hard, several times a day? That could result in this. No way around this, Thrift does seem to take up (or leak) a lot of memory every time its used, so you need a watchdog script.

If a watchdog script is too heavy-duty, you could use a simple cron job to restart Thrift during frequent intervals to make sure it stays up.

The following cron restarts Thrift every two hours.

0 */2 * * * hbase-daemon.sh restart thrift


回答2:

Using /etc/hbase/conf/hbase-env.sh, I increased my heap size, and this addressed the crashing issue.

# The maximum amount of heap to use, in MB. Default is 1000.
export HBASE_HEAPSIZE=8000

Thanks to Harsh J on the CDH Users mailing list for helping me figure out. As he pointed out, my lack of log messages indicates a kill -9 is probably taking place:

Indeed if a shutdown handler message is missing in the log tail pre-crash, there may have been a kill -9 passed to the process via the OOM handler.



回答3:

increasing heap size may not be the solution always.

as per this cloudera blog,

Thrift server might be receiving invalid data. i would suggest to enable the Framed transport and compact protocol.

there's a catch if you enable these protocols on server, client should be using the same protocol.