I'm trying to load XML data into Hive but I'm getting an error :
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmldata":""}
The xml file i have used is :
<?xml version="1.0" encoding="UTF-8"?>
The hive query i have used is :
1) Create TABLE xmltable(xmldata string) STORED AS TEXTFILE;
LOAD DATA lOCAL INPATH '/home/user/xmlfile.xml' OVERWRITE INTO TABLE xmltable;
2) CREATE VIEW xmlview (id,genre,price)
xpath(xmldata, '/catalog[1]/book[1]/id'),
xpath(xmldata, '/catalog[1]/book[1]/genre'),
xpath(xmldata, '/catalog[1]/book[1]/price')
FROM xmltable;
3) CREATE TABLE xmlfinal AS SELECT * FROM xmlview;
4) SELECT * FROM xmlfinal WHERE id ='11
Till 2nd query everything is fine but when i executed the 3rd query it's giving me error:
The error is as below:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:159)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:675)
at org.apache.hadoop.hive.ql.exec
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
So where it's going wrong? Also I'm using the proper xml file.
Reason for error :
1) case-1 : (your case) - xml content is being fed to hive as line by line.
input xml:
<?xml version="1.0" encoding="UTF-8"?>
check in hive :
select count(*) from xmltable; // return 13 rows - means each line in individual row with col xmldata
Reason for err :
XML is being read as 13 pieces not at unified. so invalid XML
2) case-2 : xml content should be fed to hive as singleString - XpathUDFs works
refer syntax : All functions follow the form: xpath_(xml_string, xpath_expression_string).* source
<?xml version="1.0" encoding="UTF-8"?><catalog><book><id>11</id><genre>Computer</genre><price>44</price></book><book><id>44</id><genre>Fantasy</genre><price>5</price></book></catalog>
check in hive:
select count(*) from xmltable; // returns 1 row - XML is properly read as complete XML.
Means :
xmldata = <?xml version="1.0" encoding="UTF-8"?><catalog><book> ...... </catalog>
then apply your xpathUDF like this
select xpath(xmldata, 'xpath_expression_string' ) from xmltable
Find Jar here -- > Brickhouse ,
sample example here --> Example
similar example in stackoverflow - here
--Load xml data to table
DROP table xmltable;
Create TABLE xmltable(xmldata string) STORED AS TEXTFILE;
LOAD DATA lOCAL INPATH '/home/vijay/data-input.xml' OVERWRITE INTO TABLE xmltable;
-- check contents
SELECT * from xmltable;
-- create view
Drop view MyxmlView;
CREATE VIEW MyxmlView(id, genre, price) AS
xpath(xmldata, 'catalog/book/id/text()'),
xpath(xmldata, 'catalog/book/genre/text()'),
xpath(xmldata, 'catalog/book/price/text()')
FROM xmltable;
-- check view
SELECT id, genre,price FROM MyxmlView;
ADD jar /home/vijay/brickhouse-0.7.0-SNAPSHOT.jar; --Add brickhouse jar
CREATE TEMPORARY FUNCTION array_index AS 'brickhouse.udf.collect.ArrayIndexUDF';
CREATE TEMPORARY FUNCTION numeric_range AS 'brickhouse.udf.collect.NumericRange';
array_index( id, n ) as my_id,
array_index( genre, n ) as my_genre,
array_index( price, n ) as my_price
from MyxmlView
lateral view numeric_range( size( id )) MyxmlView as n;
hive > SELECT
> array_index( id, n ) as my_id,
> array_index( genre, n ) as my_genre,
> array_index( price, n ) as my_price
> from MyxmlView
> lateral view numeric_range( size( id )) MyxmlView as n;
Automatically selecting local only mode for query
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Execution log at: /tmp/vijay/.log
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2014-07-09 05:36:45,220 null map = 0%, reduce = 0%
2014-07-09 05:36:48,226 null map = 100%, reduce = 0%
Ended Job = job_local_0001
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
my_id my_genre my_price
11 Computer 44
44 Fantasy 5
Time taken: 8.541 seconds, Fetched: 2 row(s)
Adding-more-info as requested by Question owner:
First try to load file my add file path-to-file, that will solve your problem as It is solved in my case
Oracle XML Extensions for Hive can be used to create Hive tables over XML like this.
then follow the below steps to get the solution as like as you want, just change the source data this
now try below steps:
select xpath(xmldata, '/catalog/book/id/text()')as id,
xpath(xmldata, '/catalog/book/genre/text()')as genre,
xpath(xmldata, '/catalog/book/price/text()')as price FROM xmltable;
now you will get ans as like this:
["11"] ["Computer"] ["44"]
["44"] ["Fantasy"] ["5"]
if you apply xapth_string, xpath_int, xpath_int udfs the you will get ans like
11 computer 44
44 Fantasy 5.
Also ensure that the XML file doesn't contain any empty spaces at the end of the last closing tag.
In my case, the source file had one, and whenever I loaded the file into hive, my resulting table contained NULLS in them.
So whenever I applied an xpath function, the result would have a few of these [] [] [] [] [] []
Although the xpath_string function worked, the xpath_double and xpath_int functions never did. It kept throwing this exception -
Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"line":""}