Insert rows into HBase from a Storm bolt

I would like to be able to write new entries into HBase from a distributed (not local) Storm topology. There exist a few GitHub projects that provide either HBase Mappers or pre-made Storm bolts to write Tuples into HBase. These projects provide instructions for executing their samples on the LocalCluster.

The problem that I am running into with both of these projects, and directly accessing the HBase API from the bolt, is that they all require the HBase-site.xml file to be included on the classpath. With the direct API approach, and perhaps with the GitHub ones as well, when you execute HBaseConfiguration.create(); it will try to find the information it needs from an entry on the classpath.

How can I modify the classpath for the storm bolts to include the Hbase configuration file?

Update: Using danehammer's answer, this is how i got it working

Copy the following files into your ~/.storm directory:

hbase-common-0.98.0.2.1.2.0-402-hadoop2.jar
hbase-site.xml
storm.yaml : NOTE: if you do not copy storm.yaml into that directory, then the storm jar command will NOT use that directory in the classpath (see the storm.py python script to see that logic for yourself - would be nice if this was documented)

Next, in your topology class's main method get the HBase Configuration and serialize it:

final Configuration hbaseConfig = HBaseConfiguration.create();
final DataOutputBuffer databufHbaseConfig = new DataOutputBuffer();
hbaseConfig.write(databufHbaseConfig);
final byte[] baHbaseConfigSerialized = databufHbaseConfig.getData();

Pass the byte array to your spout class through the constructor. The spout class saves this byte array to a field (Do not deserialize in the constructor. I found that if the spout has a Configuration field you will get a cannot serialize exception when running the topology)

in the spout's open method, deserialize the config and access the hbase table:

Configuration hBaseConfiguration = new Configuration();
ByteArrayInputStream bas = new ByteArrayInputStream(baHbaseConfigSerialized);
hBaseConfiguration.readFields(new DataInputStream(bas));
HTable tbl = new HTable(hBaseConfiguration, HBASE_TABLE_NAME);

Scan scan = new Scan();
scan.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("YOUR_COLUMN"));

scnrTbl = tbl.getScanner(scan);

Now, in your nextTuple method you can use the Scanner to get the next row:

Result rsltWaveform = scnrWaveformTbl.next();

Extract what you want from the result, and pass those values in some serializable object to the bolts.

标签： java hbase apache-storm

1条回答

放我归山

2楼-- · 2019-05-11 02:54

When you deploy a topology with the "storm jar" command, the ~/.storm folder will be on the classpath (see this link under jar command). If you placed the hbase-site.xml file (or related *-site.xml files) in that folder, HBaseConfiguration.create() during "storm jar" would find that file and correctly return you an org.apache.hadoop.configuration.Configuration. This would need to be stored and serialized within your topology in order to distribute that config around the cluster.

0人赞添加讨论(0) 举报

Insert rows into HBase from a Storm bolt

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间