Parse repeating XML tags in Hive

2019-03-04 04:48发布

问题:

I am using hivexmlserde to parse xml files. I am parsing some repeated tags in my xml and storing them as array <string>. The result i am getting is shown below.

["completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed"]   ["10160-0"] ["20140403","20151207","20160313","20101225","20100420","20110208","20100419","20110310","20100412","20120130","20110729"]  ["20160306","20110822","20110822","20110822","20110321","20110608","20110822","20120326","20110822"]    ["24","12","24","24","7","24","8","8","7","24","24","24","24","6"]  ["h","h","h","h","d","h","h","h","d","h","h","h","h","h"]

I want the result to be like.

---------------------------------------------------------------------------
|  status code |code     | startTime|endTime |strengthValue |strengthUnits |
---------------------------------------------------------------------------
|    completed | 10160-0 | 20140403 | 20160306 | 24         | h            |
|    completed | 10160-0 | 20151207 | 20110822 | 12         | h            |
|    completed | 10160-0 | 20160313 | 20120326 | 24         | h            |
|    completed | 10160-0 | 20100412 | 20110608 | 24         | h            |
|    completed | 10160-0 | 20110310 | 20110822 | 7          | d            |
|    completed | 10160-0 | 20110822 | 20110822 | 8          | h            |
----------------------------------------------------------------------------

please help me how to achieve this using hive xml ser de.

Update:

Sample:

<document>
 <code>10160-0</code>
 <entryInfo> 
    <statusCode>completed</statusCode>
    <startTime>20110729</startTime>
    <endTime>20110822</endTime>
    <strengthValue>24</strengthValue>
    <strengthUnits>h/strengthUnits>
 </entryInfo> 
 <entryInfo>
    <statusCode>completed</statusCode>
    <startTime>20120130</startTime>
    <endTime>20120326</endTime>
    <strengthValue>12</strengthValue>
    <strengthUnits>h</strengthUnits>
 </entryInfo>
 <entryinfo>
    <statusCode>completed</statusCode>
    <startTime>20100412</startTime>
    <endTime>20110822</endTime>
    <strengthValue>8</strengthValue>
    <strengthUnits>d</strengthUnits>
 </entryinfo>  
</document>