So Im trying to query my hbase cluster on Amazon ec2 using a custom jar i launch as a MapReduce step. Im my jar (inside the map function) I call Hbase as so:
public void map( Text key, BytesWritable value, Context contex ) throws IOException, InterruptedException {
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "tablename");
...
the problem is that when it gets to that HTable line and tries to connect to hbase, the step fails and I get the following errors:
2014-02-28 18:00:49,936 INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
2014-02-28 18:00:49,974 INFO [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is 5119@ip-10-0-35-130.ec2.internal
2014-02-28 18:00:49,998 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2014-02-28 18:00:50,005 WARN [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
...
2014-02-28 18:01:05,542 WARN [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
2014-02-28 18:01:05,542 ERROR [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries
2014-02-28 18:01:05,542 WARN [main] org.apache.hadoop.hbase.zookeeper.ZKUtil: hconnection Unable to set watcher on znode (/hbase/hbaseid)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
... and on and on
I can use the hbase shell just fine, and can query data and everything from the shell. I have no clue where to start and I've been googling for hours with no luck. Most of the problems like this on the internet dont talk about Amazon specific fixes. I thought zookeeper and hbase should automatically be connected properly by the amazon bootstrap.
Im using the hbase 0.94.17 jar and amazon is running hbase 0.94.7 im pretty sure thats not the problem, Im guessing its more me not setting up the Java code correctly. If anyone can help with this itd be greatly appreciated.Thanks
Well, after almost 30 hours of trying I've found the solution. There are many caveats to this, and versions are important.
In this case Im using amazon emr hadoop2 (ami 3.0.4) with Hbase 0.94.7 and Im trying to run a custom jar on the same cluster to access hbase locally through java.
So, the first thing is that the default hbase config will not work because of the external/internal IP idiosynchronicies that EC2 faces. So you cant use HConfiguration (because it defaults to a localhost quorum) What you'll have to do is use the configuration that amazon sets up for you (located in /home/hadoop/hbase/conf/hbase-site.xml) and just manually add it to a blank configuration object.
The connection code looks like this:
Secondly, you have to use the correct hbase jar PACKAGED into your custom jar. The reason is because hbase 94.x is compiled by default for hadoop1, so you have to grab the cloudera hbase jar named hbase-0.94.6-cdh4.3.0.jar (you can find this online) which has been compiled against hadoop2. If you don't do this part you will get many nasty, un-googleable errors including the org.apache.hadoop.net.NetUtils exception.