Using XLS to convert XML to text file just returns

2019-09-21 14:53发布

My company processes alot of product feeds using hadoop. We have a process to extract exactly one product node and make that a line in a file. we then use xsl to convert the product xml to a single line triple pipe delimited file. This has worked well so far. However I ran into an issue with one client. They made some changes in the new xml file are using some namespaces this caused things to break. I had to modify the links in the xml so i could post it. I changed the http to httc The Original xml file was setup like this:

<?xml version="1.0" encoding="utf-8"?>
<CATALOG APIKEY="88ac00e4f3e16e44" xmlns="urn:rrXML" xmlns:xsd="httc://www.w3.org/2001/XMLSchema" xmlns:xsi="httc://www.w3.org/2001/XMLSchema-instance">
<PRODUCTS>
  <PRODUCT ID="692174">
    <PRODUCTNAME>HP Pavilion g6t Laptop 3rd generation Intel® Core™ i5-3210M 2.5GHz SuperMulti 8X DVD+/-R/RW</PRODUCTNAME>
    <PRODUCTDESCRIPTION></PRODUCTDESCRIPTION>
    <PRODUCTSKU>100005487</PRODUCTSKU>
    <LISTPRICE>$499.99</LISTPRICE>
    <SALEPRICE xsi:type="xsd:string" xmlns:xsi="httc://www.w3.org/2001/XMLSchema-instance">$499.99</SALEPRICE>
    <PRODUCTURL>/.product.100005487.html</PRODUCTURL>
    <IMAGEURL>httc://images.test-static.com/image/media/150-__1</IMAGEURL>
    <RATING xsi:type="xsd:string" xmlns:xsi="httc://www.w3.org/2001/XMLSchema-instance">0.0</RATING>
    <BRAND>HEWLETT PACKARD</BRAND>
    <INSTOCK>1</INSTOCK>
    <REVIEWS xsi:type="xsd:string" xmlns:xsi="httc://www.w3.org/2001/XMLSchema-instance">0</REVIEWS>
    <KEYWORDS></KEYWORDS>
    <ACTIONBUTTONURL></ACTIONBUTTONURL>
    <PARENTPRODUCTID>100005487</PARENTPRODUCTID>
    <CATEGORIES />
    <ATTRIBUTES>
      <ATTRIBUTE NAME="Categories">Kaspersky Promotion</ATTRIBUTE>
      <ATTRIBUTE NAME="FSA">False</ATTRIBUTE>
      <ATTRIBUTE NAME="HIDEPRICEFROMBROWSE">False</ATTRIBUTE>
      <ATTRIBUTE NAME="ADDTOCARTFROMSEARCH">0</ATTRIBUTE>
      <ATTRIBUTE NAME="ITEMMINQTY">1.0</ATTRIBUTE>
      <ATTRIBUTE NAME="ITEMMAXQTY">1.0</ATTRIBUTE>
      <ATTRIBUTE NAME="MERCHANDISINGDESC"></ATTRIBUTE>
      <ATTRIBUTE NAME="DISCOUNTDESC"></ATTRIBUTE>
      <ATTRIBUTE NAME="ALTTEXT">HP Pavilion g6t Laptop 3rd generation Intel® Core™ i5-3210M 2.5GHz SuperMulti 8X DVD+/-R/RW</ATTRIBUTE>
      <ATTRIBUTE NAME="MAPITEM">False</ATTRIBUTE>
      <ATTRIBUTE NAME="MEMBERONLYITEM">False</ATTRIBUTE>
      <ATTRIBUTE NAME="Brand">HP</ATTRIBUTE>
      <ATTRIBUTE NAME="Graphic Card">Intel HD Graphics</ATTRIBUTE>
      <ATTRIBUTE NAME="Hard Drive Size">500 GB</ATTRIBUTE>
      <ATTRIBUTE NAME="Operating System">Windows ®</ATTRIBUTE>
      <ATTRIBUTE NAME="RAM Included">4 GB</ATTRIBUTE>
      <ATTRIBUTE NAME="Screen Size">15.6 in.</ATTRIBUTE>
    </ATTRIBUTES>
  </PRODUCT>

The new xml file is setup like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<CATALOG APIKEY="88ac00e4f3e16e44" xmlns="urn:rrXML" xmlns:xsd="httc://www.w3.org/2001/XMLSchema" xmlns:xsi="httc://www.w3.org/2001/XMLSchema-instance">
    <PRODUCTS>
        <PRODUCT ID="692174">
            <PRODUCTNAME>HP Pavilion g6t Laptop 3rd generation Intel® Core™ i5-3210M 2.5GHz SuperMulti 8X DVD+/-R/RW</PRODUCTNAME>
            <PRODUCTDESCRIPTION></PRODUCTDESCRIPTION>
            <PRODUCTSKU>100005487</PRODUCTSKU>
            <LISTPRICE>$499.99</LISTPRICE>
            <SALEPRICE xsi:type="xsd:string">$499.99</SALEPRICE>
            <PRODUCTURL>/.product.100005487.html</PRODUCTURL>
            <IMAGEURL>httc://images.test-static.com/image/media/150-__1</IMAGEURL>
            <RATING xsi:type="xsd:string">0.0</RATING>
            <BRAND>HEWLETT PACKARD</BRAND>
            <INSTOCK>1</INSTOCK>
            <REVIEWS xsi:type="xsd:string">0</REVIEWS>
            <KEYWORDS></KEYWORDS>
            <ACTIONBUTTONURL></ACTIONBUTTONURL>
            <PARENTPRODUCTID>100005487</PARENTPRODUCTID>
            <CATEGORIES>
                <CATEGORY ID="103510">
                    <CATEGORYNAME>Kaspersky Promotion</CATEGORYNAME>
                </CATEGORY>
            </CATEGORIES>
            <ATTRIBUTES>
                <ATTRIBUTE NAME="Categories">Kaspersky Promotion</ATTRIBUTE>
                <ATTRIBUTE NAME="FSA">False</ATTRIBUTE>
                <ATTRIBUTE NAME="HIDEPRICEFROMBROWSE">False</ATTRIBUTE>
                <ATTRIBUTE NAME="ADDTOCARTFROMSEARCH">0</ATTRIBUTE>
                <ATTRIBUTE NAME="ITEMMINQTY">1.0</ATTRIBUTE>
                <ATTRIBUTE NAME="ITEMMAXQTY">1.0</ATTRIBUTE>
                <ATTRIBUTE NAME="MERCHANDISINGDESC"></ATTRIBUTE>
                <ATTRIBUTE NAME="DISCOUNTDESC"></ATTRIBUTE>
                <ATTRIBUTE NAME="ALTTEXT">HP Pavilion g6t Laptop 3rd generation Intel® Core™ i5-3210M 2.5GHz SuperMulti 8X DVD+/-R/RW</ATTRIBUTE>
                <ATTRIBUTE NAME="MAPITEM">False</ATTRIBUTE>
                <ATTRIBUTE NAME="MEMBERONLYITEM">False</ATTRIBUTE>
                <ATTRIBUTE NAME="Brand">HP</ATTRIBUTE>
                <ATTRIBUTE NAME="Graphic Card">Intel HD Graphics</ATTRIBUTE>
                <ATTRIBUTE NAME="Hard Drive Size">500 GB</ATTRIBUTE>
                <ATTRIBUTE NAME="Operating System">Windows ®</ATTRIBUTE>
                <ATTRIBUTE NAME="RAM Included">4 GB</ATTRIBUTE>
                <ATTRIBUTE NAME="Screen Size">15.6 in.</ATTRIBUTE>
            </ATTRIBUTES>
        </PRODUCT>

When convert the product to single lines we only take everything between and including the product beginning and end tags.

When we did this with the new file it failed because it was dropping off the namespace. so i modified the process to include a wrapper around the product with the namespace tags. So the text being sent to be converted via xsl is:

<wrapper xmlns="urn:rrXML" xmlns:xsd="httc://www.w3.org/2001/XMLSchema" xmlns:xsi="httc://www.w3.org/2001/XMLSchema-instance">
    <PRODUCTS>
        <PRODUCT ID="692174">
            <PRODUCTNAME>HP Pavilion g6t Laptop 3rd generation Intel® Core™ i5-3210M 2.5GHz SuperMulti 8X DVD+/-R/RW</PRODUCTNAME>
            <PRODUCTDESCRIPTION></PRODUCTDESCRIPTION>
            <PRODUCTSKU>100005487</PRODUCTSKU>
            <LISTPRICE>$499.99</LISTPRICE>
            <SALEPRICE xsi:type="xsd:string">$499.99</SALEPRICE>
            <PRODUCTURL>/.product.100005487.html</PRODUCTURL>
            <IMAGEURL>httc://images.test-static.com/image/media/150-__1</IMAGEURL>
            <RATING xsi:type="xsd:string">0.0</RATING>
            <BRAND>HEWLETT PACKARD</BRAND>
            <INSTOCK>1</INSTOCK>
            <REVIEWS xsi:type="xsd:string">0</REVIEWS>
            <KEYWORDS></KEYWORDS>
            <ACTIONBUTTONURL></ACTIONBUTTONURL>
            <PARENTPRODUCTID>100005487</PARENTPRODUCTID>
            <CATEGORIES>
                <CATEGORY ID="103510">
                    <CATEGORYNAME>Kaspersky Promotion</CATEGORYNAME>
                </CATEGORY>
            </CATEGORIES>
            <ATTRIBUTES>
                <ATTRIBUTE NAME="Categories">Kaspersky Promotion</ATTRIBUTE>
                <ATTRIBUTE NAME="FSA">False</ATTRIBUTE>
                <ATTRIBUTE NAME="HIDEPRICEFROMBROWSE">False</ATTRIBUTE>
                <ATTRIBUTE NAME="ADDTOCARTFROMSEARCH">0</ATTRIBUTE>
                <ATTRIBUTE NAME="ITEMMINQTY">1.0</ATTRIBUTE>
                <ATTRIBUTE NAME="ITEMMAXQTY">1.0</ATTRIBUTE>
                <ATTRIBUTE NAME="MERCHANDISINGDESC"></ATTRIBUTE>
                <ATTRIBUTE NAME="DISCOUNTDESC"></ATTRIBUTE>
                <ATTRIBUTE NAME="ALTTEXT">HP Pavilion g6t Laptop 3rd generation Intel® Core™ i5-3210M 2.5GHz SuperMulti 8X DVD+/-R/RW</ATTRIBUTE>
                <ATTRIBUTE NAME="MAPITEM">False</ATTRIBUTE>
                <ATTRIBUTE NAME="MEMBERONLYITEM">False</ATTRIBUTE>
                <ATTRIBUTE NAME="Brand">HP</ATTRIBUTE>
                <ATTRIBUTE NAME="Graphic Card">Intel HD Graphics</ATTRIBUTE>
                <ATTRIBUTE NAME="Hard Drive Size">500 GB</ATTRIBUTE>
                <ATTRIBUTE NAME="Operating System">Windows ®</ATTRIBUTE>
                <ATTRIBUTE NAME="RAM Included">4 GB</ATTRIBUTE>
                <ATTRIBUTE NAME="Screen Size">15.6 in.</ATTRIBUTE>
            </ATTRIBUTES>
        </PRODUCT>
</wrapper>

The xsl I am trying to use is:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="httc://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text" indent="no" />
    <xsl:strip-space elements="*" />
    <xsl:template match="PRODUCT">
        <!-- skuId  --><xsl:value-of select="PRODUCTSKU"/>
        <xsl:text>|||</xsl:text>
        <!-- parentSkuId --><xsl:value-of select="PARENTPRODUCTID"/>
        <xsl:text>|||</xsl:text>
        <!-- globalSkuID  --><xsl:text></xsl:text>
        <xsl:text>|||</xsl:text>
        <!-- TaxonomyKey Path  --><xsl:text></xsl:text>
        <xsl:text>|||</xsl:text>
        <!-- TaxonomyText  --><xsl:text></xsl:text>
        <xsl:text>|||</xsl:text>
        <!-- upc --><xsl:text></xsl:text>
        <xsl:text>|||</xsl:text>
        <!-- mpn --><xsl:text></xsl:text>
        <xsl:text>|||</xsl:text>
        <!-- model_Number  --><xsl:text></xsl:text>
        <xsl:text>|||</xsl:text>
        <!-- Name  --><xsl:value-of select="PRODUCTNAME"/>
        <xsl:text>|||</xsl:text>
        <!-- shortDescription --><xsl:text></xsl:text>
        <xsl:text>|||</xsl:text>
        <!-- longDescription --><xsl:value-of select="PRODUCTDESCRIPTION"/>
        <xsl:text>|||</xsl:text>
        <!-- price --><xsl:value-of select="SALEPRICE"/>
        <xsl:text>|||</xsl:text>
        <!-- comparePrice --><xsl:value-of select="LISTPRICE"/>
        <xsl:text>|||</xsl:text>
        <!-- productPage --><xsl:value-of select="PRODUCTURL"/>
        <xsl:text>|||</xsl:text>
        <!-- thumbnail --><xsl:value-of select="IMAGEURL"/>
        <xsl:text>|||</xsl:text>
        <!-- fullImage --><xsl:value-of select="IMAGEURL"/>
        <xsl:text>|||</xsl:text>
        <!-- rating --><xsl:value-of select="RATING"/>
        <xsl:text>|||</xsl:text>
        <!-- brand --><xsl:value-of select="BRAND"/>
        <xsl:text>|||</xsl:text>
        <!-- isActive --><xsl:value-of select="INSTOCK"/>
        <xsl:text>|||</xsl:text>
        <!-- ReviewCouunt --><xsl:value-of select="REVIEWS"/>
        <xsl:text>|||</xsl:text>
        <!-- AlternateTaxonomyKeys -->
        <xsl:for-each select="CATEGORIES/CATEGORY">
            <xsl:value-of select="@ID" /><xsl:text>^</xsl:text>
        </xsl:for-each>
        <xsl:text>|||</xsl:text>
        <!-- AlternateTaxonomyNames -->
        <xsl:for-each select="CATEGORIES/CATEGORY/CATEGORYNAME">
            <xsl:value-of select="." /><xsl:text>^</xsl:text>
        </xsl:for-each>
        <xsl:text>|||</xsl:text>
        <!-- AttributeNames -->
        <xsl:for-each select="ATTRIBUTES/ATTRIBUTE">
            <xsl:value-of select="@NAME" /><xsl:text>^</xsl:text>
        </xsl:for-each>
        <xsl:text>|||</xsl:text>
        <!-- Attribute Values -->
        <xsl:for-each select="ATTRIBUTES/ATTRIBUTE">
            <xsl:value-of select="." /><xsl:text>^</xsl:text>
        </xsl:for-each>
        <xsl:text>&#xa;</xsl:text>

    </xsl:template>

</xsl:stylesheet>

This results in the output of just the string concatenated from the product level node like: HP Pavilion g6t Laptop 3rd generation Intel® Core™ i5-3210M 2.5GHz SuperMulti 8X DVD+/-R/RW100005487$499.99$499.99/.product.100005487.htmlhttc://images.test-static.com/image/media/150-__10.0HEWLETT PACKARD10100005487

I'm guessing it has something to do with the namespaces they are including but I don't really know enough about using xsl to figure out what. Please Help

标签: xml xslt xsd
2条回答
时光不老,我们不散
2楼-- · 2019-09-21 15:14

You have to add the namespace of the XML document to the XSLT by defining a namespace with the same namespace-uri(), e.g. xmlns:u="urn:rrXML". Then you can access the elements in the XML with this prefix, meaning: you get the value using <xsl:value-of select="u:PRODUCTSKU"/> instead of <xsl:value-of select="PRODUCTSKU"/>. When the missing closing PRODUCTS tag is added in your input XML, following XSLT

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:u="urn:rrXML" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" indent="no" />
<xsl:strip-space elements="*" />

  <xsl:template match="u:PRODUCT" >
    <!-- skuId  --><xsl:value-of select="u:PRODUCTSKU"/>
    <xsl:text>|||</xsl:text>
    <!-- parentSkuId --><xsl:value-of select="u:PARENTPRODUCTID"/>
    <xsl:text>|||</xsl:text>
    <!-- globalSkuID  --><xsl:text></xsl:text>
    <xsl:text>|||</xsl:text>
    <!-- TaxonomyKey Path  --><xsl:text></xsl:text>
    <xsl:text>|||</xsl:text>
    <!-- TaxonomyText  --><xsl:text></xsl:text>
    <xsl:text>|||</xsl:text>
    <!-- upc --><xsl:text></xsl:text>
    <xsl:text>|||</xsl:text>
    <!-- mpn --><xsl:text></xsl:text>
    <xsl:text>|||</xsl:text>
    <!-- model_Number  --><xsl:text></xsl:text>
    <xsl:text>|||</xsl:text>
    <!-- Name  --><xsl:value-of select="u:PRODUCTNAME"/>
    <xsl:text>|||</xsl:text>
    <!-- shortDescription --><xsl:text></xsl:text>
    <xsl:text>|||</xsl:text>
    <!-- longDescription --><xsl:value-of select="u:PRODUCTDESCRIPTION"/>
    <xsl:text>|||</xsl:text>
    <!-- price --><xsl:value-of select="u:SALEPRICE" />
    <xsl:text>|||</xsl:text>
    <!-- comparePrice --><xsl:value-of select="u:LISTPRICE"/>
    <xsl:text>|||</xsl:text>
    <!-- productPage --><xsl:value-of select="u:PRODUCTURL"/>
    <xsl:text>|||</xsl:text>
    <!-- thumbnail --><xsl:value-of select="u:IMAGEURL"/>
    <xsl:text>|||</xsl:text>
    <!-- fullImage --><xsl:value-of select="u:IMAGEURL"/>
    <xsl:text>|||</xsl:text>
    <!-- rating --><xsl:value-of select="u:RATING"/>
    <xsl:text>|||</xsl:text>
    <!-- brand --><xsl:value-of select="u:BRAND"/>
    <xsl:text>|||</xsl:text>
    <!-- isActive --><xsl:value-of select="u:INSTOCK"/>
    <xsl:text>|||</xsl:text>
    <!-- ReviewCouunt --><xsl:value-of select="u:REVIEWS"/>
    <xsl:text>|||</xsl:text>
    <!-- AlternateTaxonomyKeys -->
    <xsl:for-each select="u:CATEGORIES/u:CATEGORY">
        <xsl:value-of select="@ID" /><xsl:text>^</xsl:text>
    </xsl:for-each>
    <xsl:text>|||</xsl:text>
    <!-- AlternateTaxonomyNames -->
    <xsl:for-each select="u:CATEGORIES/u:CATEGORY/u:CATEGORYNAME">
        <xsl:value-of select="." /><xsl:text>^</xsl:text>
    </xsl:for-each>
    <xsl:text>|||</xsl:text>
    <!-- AttributeNames -->
    <xsl:for-each select="u:ATTRIBUTES/u:ATTRIBUTE">
        <xsl:value-of select="@NAME" /><xsl:text>^</xsl:text>
    </xsl:for-each>
    <xsl:text>|||</xsl:text>
    <!-- Attribute Values -->
    <xsl:for-each select="u:ATTRIBUTES/u:ATTRIBUTE">
        <xsl:value-of select="." /><xsl:text>^</xsl:text>
    </xsl:for-each>
    <xsl:text>&#xa;</xsl:text>
  </xsl:template>
</xsl:stylesheet>

produces the output
100005487|||100005487|||||||||||||||||||||HP Pavilion g6t Laptop 3rd generation Intel® Core™ i5-3210M 2.5GHz SuperMulti 8X DVD+/-R/RW|||||||||$499.99|||$499.99|||/.product.100005487.html|||httc://images.test-static.com/image/media/150-__1|||httc://images.test-static.com/image/media/150-__1|||0.0|||HEWLETT PACKARD|||1|||0|||103510^|||Kaspersky Promotion^|||Categories^FSA^HIDEPRICEFROMBROWSE^ADDTOCARTFROMSEARCH^ITEMMINQTY^ITEMMAXQTY^MERCHANDISINGDESC^DISCOUNTDESC^ALTTEXT^MAPITEM^MEMBERONLYITEM^Brand^Graphic Card^Hard Drive Size^Operating System^RAM Included^Screen Size^|||Kaspersky Promotion^False^False^0^1.0^1.0^^^HP Pavilion g6t Laptop 3rd generation Intel® Core™ i5-3210M 2.5GHz SuperMulti 8X DVD+/-R/RW^False^False^HP^Intel HD Graphics^500 GB^Windows ®^4 GB^15.6 in.^

in one line, if that's really the intended ouput.

查看更多
叛逆
3楼-- · 2019-09-21 15:38

I'm guessing it has something to do with the namespaces they are including but I don't really know enough about using xsl to figure out what.

You are guessing correctly - and a short search should have revealed the answer: assign a prefix to the namespace and use that prefix when addressing the elements of the XML source, for example:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:rrx="urn:rrXML">
<xsl:strip-space elements="*" />
<xsl:output method="text"/>

<xsl:template match="rrx:PRODUCT">
    <!-- skuId  --><xsl:value-of select="rrx:PRODUCTSKU"/>
    <xsl:text>|||</xsl:text>
    <!-- parentSkuId --><xsl:value-of select="rrx:PARENTPRODUCTID"/>
    <xsl:text>|||</xsl:text>

    <!-- etc.  -->

    <xsl:text>&#xa;</xsl:text>
</xsl:template>

</xsl:stylesheet>
查看更多
登录 后发表回答