Proper ways to Put XML into HBase

2019-09-04 01:19发布

I am trying to put into HBase (version 1.1.X) some XML files stored locally.

My goal is to store the content of those XMLs in my HBase Table as string using MapReduce (no reduce stage) without loading them to HDFS.

Here is my pseudo-code:

fetchXMLs(path);
XML2OneLineFile();
configureHBase(); // + establishing connection
Map(input, output); //input: one XML file in one line; output : is the Put() of HBase;
closeConnection(); 

Is this way of tackling the problem correct, or there are better ways to do it?

ps: I do not want to parse or extract data from my XML, just store them.

Thanks in advance

2条回答
贪生不怕死
2楼-- · 2019-09-04 01:44

Hbase is not really made for large object storing. Depending on the size of your XML, HBase might not be the solution you are looking for.

At this moment, I am working on a database consisting of multiple file types including XML. What I thought was good was to store any files under 1MB to HBase, and the rest to Hadoop, maintaining meta data either in an SQL or in HBase.

It depends a lot of what you want to achieve with this data.

查看更多
ゆ 、 Hurt°
3楼-- · 2019-09-04 01:49

Instead of storing XML string to Hbase, you can store them as byte[] and you can retrieve it back as an object( of serialized type) using deserialization.

You can do that in below way using Apache commons API.

For ex :

byte[] xmlInBytes = org.apache.commons.lang.SerializationUtils.serialize(Serializable obj)

for deserializing, you can do this

static Object deserialize(byte[] objectData) 

if Object may be w3c document, object should be typecasted to w3c document.

I have already tested with many type of objects not only XML. It should work in the same way. Hope this helps.

查看更多
登录 后发表回答