Hadoop pig XPath returning empty attribute value

2019-09-01 05:28发布

I am using cloudera Hadoop 2.6, pig 0.15 versions.

I am trying to extract data from xml file. Below you can see part of xml file.

<product productID="MICROLITEMX1600LAMP">
  <basicInfo>
                <category lang="NL" id="OT1006">Output Accessoires</category>
  </basicInfo>
</product>

I can dump node values but not attribute values using XPath() function. You can see the code below which is returning empty tuples instead of productID.

    DEFINE XPath org.apache.pig.piggybank.evaluation.xml.XPath();   
    allProducts = LOAD '/pathtofile/sample.xml' USING org.apache.pig.piggybank.storage.XMLLoader('product') AS (data:chararray);
    productsOneByOne = FOREACH allProducts GENERATE XPath(data, 'product/@productID') AS productid:chararray
    dump productsOneByOne;

Please help me out to resolve this issue.

1条回答
不美不萌又怎样
2楼-- · 2019-09-01 06:14

Adding more to How to extract xml attributes using Xpath in Pig?

Bug is there in XPath.java as it is ignoring 4th parameter.

By adding following code in XPath.java and complied issue is resolved. http://svn.apache.org/repos/asf/pig/branches/branch-0.15/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/xml/XPath.java

if(input.size() > 3){
  ignoreNamespace=input.get(3);
}

above code should be added before

if (ignoreNamespace) {
                xpathString = createNameSpaceIgnoreXpathString(xpathString);
 }
查看更多
登录 后发表回答