Issues with comma separated data with XSLT

2019-08-28 20:27发布

Am Trying to convert an XML data into CSV with XSLT. The rows are separated with commas but some of the data have double quotes. i used the following codes for the conversion but it doesn't handle the data properly, especially the rows with quotes.

this is my sample data

<Add>
<Rowinfo>
<LocatorD>Dwelling  </LocatorD>
<LName> shark </LName>
<L>1</L>
<AArea>Abesinia Passage</AArea>
</Rowinfo>

when the XSL is applied to the above data it produces

LocatorDesignator,LocatorName,     Locator      ,  Thoroughfare     ,      AddressArea

Dwelling         ,     shark ,       1          ,   Abesinia Passage,

Shop 01-Feb,Shop ,       1   , Casenapes Square ,                   ,

But the intended result is to produce

LocatorDesignator,LocatorName,Locator,   Thoroughfare      ,       AddressArea

Dwelling         ,     shark ,  1    ,   Abesinia Passage  ,

Shop 01-Feb      ,     Shop  ,  1    ,    Casenapes Square ,

In other words when you open this as a CSV file

  • Shop 01-Feb,Shop happend to be in a single column
  • instead of seperate columns such as:

    LocatorDesignator| LocatorName

    Shop 01-Feb,shop|

instead of

LocatorDesignator| LocatorName
Shop 01-Feb      | Shop

标签: xml xslt xpath
1条回答
Emotional °昔
2楼-- · 2019-08-28 21:20

This XSLT 1.0 style-sheet...

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>

<xsl:template match="/">
   <xsl:apply-templates select="*/Rowinfo[1]/*" mode="heading" />
   <xsl:value-of select="'&#x0A;'" />
   <xsl:apply-templates select="*/Rowinfo" />
</xsl:template>

<xsl:template match="Rowinfo/*" mode="heading" >
 <xsl:value-of select="local-name()" />
 <xsl:if test="position() != last()">  
  <xsl:value-of select="','" />
 </xsl:if>  
</xsl:template>

<xsl:template match="Rowinfo">
  <xsl:variable name="line-with-extra-comma">
   <xsl:for-each select="*">
    <xsl:variable name="col-name" select="local-name()" />
    <xsl:if test="../../Rowinfo[1]/*[local-name() = $col-name]">
      <xsl:call-template name="csv-encode" />
      <xsl:value-of select="','" />
    </xsl:if>  
   </xsl:for-each>
  </xsl:variable> 
 <xsl:value-of select="concat(
      substring($line-with-extra-comma, 1,
      string-length($line-with-extra-comma) - 1),
    '&#x0A;')" />
</xsl:template>

<xsl:template name="escape-value">
 <xsl:param name="text" />
 <xsl:choose>
  <xsl:when test="contains($text,'&quot;')">
    <xsl:value-of select="concat( substring-before($text,'&quot;'), '&quot;&quot;')" />
    <xsl:call-template name="escape-value">
      <xsl:with-param name="text" select="substring-after($text,'&quot;')" />
    </xsl:call-template>  
  </xsl:when>
  <xsl:otherwise>
   <xsl:value-of select="$text" /> 
  </xsl:otherwise>  
 </xsl:choose>  
</xsl:template>

<xsl:template name="csv-encode">
 <xsl:choose>
  <xsl:when test="contains(.,',') or starts-with(.,'&quot;')">
   <xsl:value-of select="'&quot;'" />
    <xsl:call-template name="escape-value">
     <xsl:with-param name="text" select="text()" /> 
    </xsl:call-template>  
   <xsl:value-of select="'&quot;'" /> 
  </xsl:when>
  <xsl:otherwise>
   <xsl:value-of select="." /> 
  </xsl:otherwise>  
 </xsl:choose>  
</xsl:template>  

</xsl:stylesheet>

...will take this input document...

<Address>
    <Rowinfo>
        <LocatorDesignator>Dwelling  </LocatorDesignator>
        <LocatorName> shark </LocatorName>
        <Locator>1</Locator>
        <AddressArea>Abesinia Passage</AddressArea>
    </Rowinfo>
    <Rowinfo>
        <LocatorDesignator>"Shop 01-Feb</LocatorDesignator>
        <LocatorName>"Shop</LocatorName>
        <Locator>1</Locator>
        <Thoroughfare>Casenapes Square</Thoroughfare>
        <AddressArea/>
    </Rowinfo>
</Address>

...and transform it into this csv output...

LocatorDesignator,LocatorName,Locator,AddressArea
Dwelling  , shark ,1,Abesinia Passage
"""Shop 01-Feb","""Shop",1,

Caveats

I have assumed that:

  1. Column headings are defined by the child elements of the first row. If it is possible for table to be empty (no rows), you will need to adjust accordingly.
  2. Elements by name appear in the same order in subsequent rows as they do in the first row.
  3. Subsequent rows may contain extraneous child elements, but never missing ones. Extraneous elements are dropped.
  4. Csv output is proper csv output. Values are double-quote escaped if either they contain a comma or start with a double-quote.
  5. All values are single-line. Multi-line csv is not handled by this script.
  6. The output line terminator is LF. If you need CR.LF or something else, adjust accordingly.
查看更多
登录 后发表回答