Am Trying to convert an XML data into CSV with XSLT. The rows are separated with commas but some of the data have double quotes. i used the following codes for the conversion but it doesn't handle the data properly, especially the rows with quotes.
this is my sample data
<Add>
<Rowinfo>
<LocatorD>Dwelling </LocatorD>
<LName> shark </LName>
<L>1</L>
<AArea>Abesinia Passage</AArea>
</Rowinfo>
when the XSL is applied to the above data it produces
LocatorDesignator,LocatorName, Locator , Thoroughfare , AddressArea
Dwelling , shark , 1 , Abesinia Passage,
Shop 01-Feb,Shop , 1 , Casenapes Square , ,
But the intended result is to produce
LocatorDesignator,LocatorName,Locator, Thoroughfare , AddressArea
Dwelling , shark , 1 , Abesinia Passage ,
Shop 01-Feb , Shop , 1 , Casenapes Square ,
In other words when you open this as a CSV file
instead of
LocatorDesignator| LocatorName
Shop 01-Feb | Shop
This XSLT 1.0 style-sheet...
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:apply-templates select="*/Rowinfo[1]/*" mode="heading" />
<xsl:value-of select="'
'" />
<xsl:apply-templates select="*/Rowinfo" />
</xsl:template>
<xsl:template match="Rowinfo/*" mode="heading" >
<xsl:value-of select="local-name()" />
<xsl:if test="position() != last()">
<xsl:value-of select="','" />
</xsl:if>
</xsl:template>
<xsl:template match="Rowinfo">
<xsl:variable name="line-with-extra-comma">
<xsl:for-each select="*">
<xsl:variable name="col-name" select="local-name()" />
<xsl:if test="../../Rowinfo[1]/*[local-name() = $col-name]">
<xsl:call-template name="csv-encode" />
<xsl:value-of select="','" />
</xsl:if>
</xsl:for-each>
</xsl:variable>
<xsl:value-of select="concat(
substring($line-with-extra-comma, 1,
string-length($line-with-extra-comma) - 1),
'
')" />
</xsl:template>
<xsl:template name="escape-value">
<xsl:param name="text" />
<xsl:choose>
<xsl:when test="contains($text,'"')">
<xsl:value-of select="concat( substring-before($text,'"'), '""')" />
<xsl:call-template name="escape-value">
<xsl:with-param name="text" select="substring-after($text,'"')" />
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$text" />
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template name="csv-encode">
<xsl:choose>
<xsl:when test="contains(.,',') or starts-with(.,'"')">
<xsl:value-of select="'"'" />
<xsl:call-template name="escape-value">
<xsl:with-param name="text" select="text()" />
</xsl:call-template>
<xsl:value-of select="'"'" />
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="." />
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
...will take this input document...
<Address>
<Rowinfo>
<LocatorDesignator>Dwelling </LocatorDesignator>
<LocatorName> shark </LocatorName>
<Locator>1</Locator>
<AddressArea>Abesinia Passage</AddressArea>
</Rowinfo>
<Rowinfo>
<LocatorDesignator>"Shop 01-Feb</LocatorDesignator>
<LocatorName>"Shop</LocatorName>
<Locator>1</Locator>
<Thoroughfare>Casenapes Square</Thoroughfare>
<AddressArea/>
</Rowinfo>
</Address>
...and transform it into this csv output...
LocatorDesignator,LocatorName,Locator,AddressArea
Dwelling , shark ,1,Abesinia Passage
"""Shop 01-Feb","""Shop",1,
Caveats
I have assumed that:
- Column headings are defined by the child elements of the first row. If it is possible for table to be empty (no rows), you will need to adjust accordingly.
- Elements by name appear in the same order in subsequent rows as they do in the first row.
- Subsequent rows may contain extraneous child elements, but never missing ones. Extraneous elements are dropped.
- Csv output is proper csv output. Values are double-quote escaped if either they contain a comma or start with a double-quote.
- All values are single-line. Multi-line csv is not handled by this script.
- The output line terminator is LF. If you need CR.LF or something else, adjust accordingly.