I am trying to put into HBase (version 1.1.X) some XML files stored locally.
My goal is to store the content of those XMLs in my HBase Table as string using MapReduce (no reduce stage) without loading them to HDFS.
Here is my pseudo-code:
fetchXMLs(path);
XML2OneLineFile();
configureHBase(); // + establishing connection
Map(input, output); //input: one XML file in one line; output : is the Put() of HBase;
closeConnection();
Is this way of tackling the problem correct, or there are better ways to do it?
ps: I do not want to parse or extract data from my XML, just store them.
Thanks in advance
Hbase is not really made for large object storing. Depending on the size of your XML, HBase might not be the solution you are looking for.
At this moment, I am working on a database consisting of multiple file types including XML. What I thought was good was to store any files under 1MB to HBase, and the rest to Hadoop, maintaining meta data either in an SQL or in HBase.
It depends a lot of what you want to achieve with this data.
Instead of storing XML string to Hbase, you can store them as byte[] and you can retrieve it back as an object( of serialized type) using deserialization.
You can do that in below way using Apache commons API.
For ex :
for deserializing, you can do this
if Object may be
w3c
document, object should be typecasted to w3c document.I have already tested with many type of objects not only XML. It should work in the same way. Hope this helps.