I have a pipe delimited text file as shown below, which I need to transform into a well formed xml structure (example shown below) using xsl. The xsl below is my (latest) attempt at solving this - however I cannot seem to find a way to encapsulate the level 002 elements in level 001, i.e. maintain the parent-child relationship, when iterating through the file line by line. Could anyone help here ?
Pipe delimited file - input
001|XXX|YYY
002|AAA|BBB
002|CCC|DD
001|EEF|XXX
002|HHH|GGG
XML File - desired output
<root>
<level001>
<elem name="field1">001</elem>
<elem name="field2">XXX</elem>
<elem name="field3">YYY</elem>
<level002>
<elem name="field1">002</elem>
<elem name="field2">AAA</elem>
<elem name="field3">BBB</elem>
</level002>
<level002>
<elem name="field1">002</elem>
<elem name="field2">CCC</elem>
<elem name="field3">DD</elem>
</level002>
</level001>
<level001>
<elem name="field1">001</elem>
<elem name="field2">XXX</elem>
<elem name="field3">YYY</elem>
<level002>
<elem name="field1">002</elem>
<elem name="field2">HHH</elem>
<elem name="field3">GG</elem>
</level002>
</level001>
</root>
Current XSL
<xsl:variable name="Cols">
<col>field1,1</col>
<col>field2,2</col>
<col>field3,3</col>
</xsl:variable>
<xsl:template match="/" name="main">
<xsl:choose>
<xsl:when test="unparsed-text-available($pathToCSV, $encoding)">
<xsl:variable name="csv" select="unparsed-text($pathToCSV, $encoding)" />
<xsl:variable name="lines" select="tokenize($csv, '\n')" as="xs:string+" />
<root>
<xsl:for-each select="$lines[position() > 0]">
<xsl:if test="translate(., '  	 ', '') != ''">
<level001>
<xsl:variable name="line" select="." />
<xsl:variable name="columns" select="tokenize(.,'\|')" as="xs:string+"/>
<xsl:choose>
<xsl:when test="$columns[1]='001'">
<xsl:for-each select="$Cols/col">
<xsl:variable name="column" select="number(substring-after(.,','))"/>
<elem name="{substring-before(.,',')}">
<!-- trims the whitespace from the beginning and the ending of the value -->
<xsl:value-of select="replace(replace($columns[$column],'\s+$',''),'^\s+','')"/>
</elem>
</xsl:for-each>
</xsl:when>
<xsl:when test="$columns[1]='002'">
<level002>
<xsl:for-each select="$Cols/col">
<xsl:variable name="column" select="number(substring-after(.,','))"/>
<elem name="{substring-before(.,',')}">
<!-- trims the whitespace from the beginning and the ending of the value -->
<xsl:value-of select="replace(replace($columns[$column],'\s+$',''),'^\s+','')"/>
</elem>
</xsl:for-each>
</level002>
</xsl:when>
</xsl:choose>
</level001>
</xsl:if>
</xsl:for-each>
</root>
</xsl:when>
</xsl:choose>
You can find a solution to essentially the same problem here:
http://www.saxonica.com/papers/ideadb-1.1/mhk-paper.xml
The core is a recursive grouping template:
Well, you're iterating over every line and already closing the
level001
tag when finished with the line. Why not try something like (pseudo-code):<level001>
<level002>
</level002>
</level001>
I would first transform the flat text into a flat XML structure and then group that with
for-each-group group-starting-with
, as in the following code sample:When I apply that stylesheet with Saxon 9 using
java -jar saxon9he.jar -it:main -xsl:sheet.xsl
, the result I get isThe stylesheet has a parameter named
text-url
to the plain text file you can set when running the stylesheet.