I'm running Cloudera CDH4 with Hbase and Hbase Thrift Server. Several times a day, the Thrift Server crashes.
In /var/log/hbase/hbase-hbase-thrift-myserver.out, there is this:
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
# Executing /bin/sh -c "kill -9 8151"...
In /var/log/hbase/hbase-hbase-thrift-myserver.log, there are no error messages at the end of the file. There are only a lot of DEBUG messages stating that one of the nodes is caching a particular file.
I can't figure out any configuration options for the Hbase Thrift Server. There are no obvious files in /etc/. Just /etc/hbase/conf and its Hbase files.
Any ideas on debugging?
increasing heap size may not be the solution always.
as per this cloudera blog,
Thrift server might be receiving invalid data. i would suggest to enable the Framed transport and compact protocol.
there's a catch if you enable these protocols on server, client should be using the same protocol.
We had this exact same problem with our HBase Thrift setup, and ended up using a watchdog script that restarts Thrift if its not running.
Are you hitting your HBase server hard, several times a day? That could result in this. No way around this, Thrift does seem to take up (or leak) a lot of memory every time its used, so you need a watchdog script.
If a watchdog script is too heavy-duty, you could use a simple cron job to restart Thrift during frequent intervals to make sure it stays up.
The following cron restarts Thrift every two hours.
Using /etc/hbase/conf/hbase-env.sh, I increased my heap size, and this addressed the crashing issue.
Thanks to Harsh J on the CDH Users mailing list for helping me figure out. As he pointed out, my lack of log messages indicates a
kill -9
is probably taking place: