I'm trying to load XML data into Hive but I'm getting an error :
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmldata":""}
The xml file i have used is :
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book>
<id>11</id>
<genre>Computer</genre>
<price>44</price>
</book>
<book>
<id>44</id>
<genre>Fantasy</genre>
<price>5</price>
</book>
</catalog>
The hive query i have used is :
1) Create TABLE xmltable(xmldata string) STORED AS TEXTFILE;
LOAD DATA lOCAL INPATH '/home/user/xmlfile.xml' OVERWRITE INTO TABLE xmltable;
2) CREATE VIEW xmlview (id,genre,price)
AS SELECT
xpath(xmldata, '/catalog[1]/book[1]/id'),
xpath(xmldata, '/catalog[1]/book[1]/genre'),
xpath(xmldata, '/catalog[1]/book[1]/price')
FROM xmltable;
3) CREATE TABLE xmlfinal AS SELECT * FROM xmlview;
4) SELECT * FROM xmlfinal WHERE id ='11
Till 2nd query everything is fine but when i executed the 3rd query it's giving me error:
The error is as below:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:159)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:675)
at org.apache.hadoop.hive.ql.exec
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
So where it's going wrong? Also I'm using the proper xml file.
Thanks, Shree
First try to load file my add file path-to-file, that will solve your problem as It is solved in my case
Oracle XML Extensions for Hive can be used to create Hive tables over XML like this. https://docs.oracle.com/cd/E54130_01/doc.26/e54142/oxh_hive.htm#BDCUG691
Reason for error :
1) case-1 : (your case) - xml content is being fed to hive as line by line.
input xml:
check in hive :
Reason for err :
XML is being read as 13 pieces not at unified. so invalid XML
2) case-2 : xml content should be fed to hive as singleString - XpathUDFs works refer syntax : All functions follow the form: xpath_(xml_string, xpath_expression_string).* source
input.xml
check in hive:
Means :
then apply your xpathUDF like this
then follow the below steps to get the solution as like as you want, just change the source data this
now try below steps:
now you will get ans as like this:
["11"] ["Computer"] ["44"]
["44"] ["Fantasy"] ["5"]
if you apply xapth_string, xpath_int, xpath_int udfs the you will get ans like
11 computer 44
44 Fantasy 5.
Thanks
Find Jar here -- > Brickhouse ,
sample example here --> Example
similar example in stackoverflow - here
Solution:
Output:
Time taken: 8.541 seconds, Fetched: 2 row(s)
Adding-more-info as requested by Question owner:
Also ensure that the XML file doesn't contain any empty spaces at the end of the last closing tag. In my case, the source file had one, and whenever I loaded the file into hive, my resulting table contained NULLS in them. So whenever I applied an xpath function, the result would have a few of these [] [] [] [] [] []
Although the xpath_string function worked, the xpath_double and xpath_int functions never did. It kept throwing this exception -