I'm running Cloudera CDH4 with Hbase and Hbase Thrift Server. Several times a day, the Thrift Server crashes.
In /var/log/hbase/hbase-hbase-thrift-myserver.out, there is this:
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
# Executing /bin/sh -c "kill -9 8151"...
In /var/log/hbase/hbase-hbase-thrift-myserver.log, there are no error messages at the end of the file. There are only a lot of DEBUG messages stating that one of the nodes is caching a particular file.
I can't figure out any configuration options for the Hbase Thrift Server. There are no obvious files in /etc/. Just /etc/hbase/conf and its Hbase files.
Any ideas on debugging?
We had this exact same problem with our HBase Thrift setup, and ended up using a watchdog script that restarts Thrift if its not running.
Are you hitting your HBase server hard, several times a day? That could result in this. No way around this, Thrift does seem to take up (or leak) a lot of memory every time its used, so you need a watchdog script.
If a watchdog script is too heavy-duty, you could use a simple cron job to restart Thrift during frequent intervals to make sure it stays up.
The following cron restarts Thrift every two hours.
0 */2 * * * hbase-daemon.sh restart thrift
Using /etc/hbase/conf/hbase-env.sh, I increased my heap size, and this addressed the crashing issue.
# The maximum amount of heap to use, in MB. Default is 1000.
export HBASE_HEAPSIZE=8000
Thanks to Harsh J on the CDH Users mailing list for helping me figure out. As he pointed out, my lack of log messages indicates a kill -9
is probably taking place:
Indeed if a shutdown handler message is missing in the log tail
pre-crash, there may have been a kill -9 passed to the process via the
OOM handler.
increasing heap size may not be the solution always.
as per this cloudera blog,
Thrift server might be receiving invalid data.
i would suggest to enable the Framed transport and compact protocol.
there's a catch if you enable these protocols on server, client should be using the same protocol.