I want to store xml data into hive table, XML data :
<servicestatuslist>
<recordcount>1266</recordcount>
<servicestatus id="435680">
<status_text>/: 61%used(9714MB/15975MB) (<80%) : OK</status_text>
<display_name>/ Disk Usage</display_name>
<host_name>zabbix.vshodc.com</host_name>
</servicestatus>
</servicestatuslist>
I have added jar file to path
hive> add jar /home/cloudera/HiveJars/hivexmlserde-1.0.5.1.jar ;
Added /home/cloudera/HiveJars/hivexmlserde-1.0.5.1.jar to class path
Added resource: /home/cloudera/HiveJars/hivexmlserde-1.0.5.1.jar
I have written a hive serDe query:
create table xml_AIR(id STRING, status_text STRING,display_name STRING ,host_name STRING)
row format serde 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
with serdeproperties(
"column.xpath.id"="/servicestatus/@id",
"column.xpath.status_text"="/servicestatus/status_text/text()",
"column.xpath.display_name"="/servicestatus/display_name/text()",
"column.xpath.host_name"="/servicestatus/host_name/text()"
)
stored as
inputformat 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/user/cloudera/input/air.xml'
tblproperties(
"xmlinput.start"="<servicestatus",
"xmlinput.end"="</servicestatus>"
);
OK
Time taken: 1.609 seconds
When I issued select command , it didn't show the table's data:
hive> select * from xml_AIR;
OK
Time taken: 3.0 seconds
What's wrong in the above code? Please help.
Well, the code looks good. As per the example in this link, it should work for you.
Btw, there is a typo in the code that you have provided. In the table definition status_test STRING should be status_text STRING or vice versa.
LOCATION give directory only instead of file
According to Hive DDL documentation, LOCATION clause expects an hdfs_path. Hence, try specifying only the directory, not the whole path to your XML file. By using LOAD after CREATE TABLE, you cannot have external tables, which might be an interesting approach in some cases.
Reference: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/TruncateTable
The entire XML file should be a single line (i.e. no newlines in the XML). (A simple unix command to strip newlines is tr '\n\r' ' ' < source.xml > processed.xml.)
https://github.com/dvasilen/Hive-XML-SerDe/wiki/XML-data-sources
I came out through the same Problem when dealing with XML Serde. After some struggle, I fixed it by using the "Load data" statement separately and avoiding addition of "LOCATION" property in "CREATE" statement. the following is my XML data.
CREATE TABLE Statement:
CREATE Query Result:
for the above create statement, I used the following "LOAD DATA" statement to load the data contained in an XML file in to the above created table.
LOAD Query Result:
And finally,
SELECT Query and Result:
And in the above query i would suggest the value for
"xmlinput.start"
as"<servicestatus id"
, instead of"<servicestatus"
,because the XML start tag is in the pattern<servicestatus id="some data">
.I believe this would be helpful for you.