XML:
<sample>
<test>
<Cell1>John</Cell1>
<Cell2>A</Cell2>
<Cell4>xy</Cell4>
</test>
<test>
<Cell1>John</Cell1>
<Cell2>B</Cell2>
<Cell6>10</Cell6>
</test>
<test>
<Cell1>John,Jade</Cell1>
<Cell2>A,Y</Cell2>
<Cell4>1</Cell4>
</test>
<test>
<Cell1>John,Jade</Cell1>
<Cell2>A C,X</Cell2>
</test>
<test>
<Cell1>John,Jade</Cell1>
<Cell2>C D,Y</Cell2>
</test>
<test>
<Cell1>John</Cell1>
<Cell2>A B</Cell2>
<Cell4>xy</Cell4>
</test>
</sample>
XSLT:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
<xsl:output method="xml" encoding="UTF-8" indent="no"/>
<xsl:template match="/">
<xsl:apply-templates select="sample"/>
</xsl:template>
<xsl:template match="sample">
<xsl:variable name="atomictest">
<!--Store the test containing only one value in cell2-->
<xsl:copy-of select="test[not(contains(Cell2,',')) or not(contains(Cell2,' '))]"/>
</xsl:variable>
<xsl:variable name="copy">
<xsl:apply-templates select="test">
<xsl:with-param name="atomictest" select="$atomictest"/>
</xsl:apply-templates>
</xsl:variable>
</xsl:template>
<xsl:template match="test">
<xsl:param name="atomictest"/>
<xsl:choose>
<xsl:when test="contains(Cell2,',')">
<xsl:variable name="Cell1">
<xsl:copy-of select="Cell1"/>
</xsl:variable>
<!-- tokenize cell2 based on comma -->
<xsl:for-each select="tokenize(Cell2,',')">
<xsl:variable name="str">
<xsl:value-of select="."/>
</xsl:variable>
<xsl:variable name="pos">
<xsl:value-of select="position()"/>
</xsl:variable>
<xsl:choose>
<!-- If cell2 contains space -->
<xsl:when test="contains(.,' ')">
<!-- tokenize cell2 based on comma -->
<xsl:for-each select="tokenize(.,' ')">
<xsl:variable name="str">
<xsl:value-of select="."/>
</xsl:variable>
<!-- if cell2 value not contained in the atomic collected -->
<xsl:if test="not($atomictest/test[normalize-space(Cell2/text())=normalize-space($str)])">
<!--Store Cell2 value -->
<xsl:variable name="Cell2">
<xsl:value-of select="."/>
</xsl:variable>
<!-- tokenize cell1-->
<xsl:for-each select="tokenize($Cell1/Cell1,',')">
<xsl:if test="position()=$pos">
<test>
<Cell1>
<xsl:value-of select="."/>
</Cell1>
<Cell2>
<xsl:value-of select="$Cell2"/>
</Cell2>
</test>
</xsl:if>
</xsl:for-each>
</xsl:if>
</xsl:for-each>
</xsl:when>
<xsl:otherwise>
<!-- if cell2 doesnot contains space -->
<xsl:if test="not($atomictest/test[normalize-space(Cell2/text())=normalize-space($str)])">
<xsl:variable name="Cell2">
<xsl:value-of select="."/>
</xsl:variable>
<xsl:for-each select="tokenize($Cell1/Cell1,',')">
<xsl:if test="position()=$pos">
<test>
<Cell1>
<xsl:value-of select="."/>
</Cell1>
<Cell2>
<xsl:value-of select="$Cell2"/>
</Cell2>
</test>
</xsl:if>
</xsl:for-each>
</xsl:if>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:when>
<xsl:when test="contains(Cell2,' ')">
<xsl:variable name="Cell1">
<xsl:copy-of select="Cell1"/>
</xsl:variable>
<!-- tokenize cell2 based on space or comma -->
<xsl:for-each select="tokenize(Cell2,' ')">
<xsl:variable name="str">
<xsl:value-of select="."/>
</xsl:variable>
<xsl:variable name="pos">
<xsl:value-of select="position()"/>
</xsl:variable>
<!-- if cell2 value not contained in the atomic rows collected -->
<xsl:if test="not($atomictest/test[normalize-space(Cell2/text())=normalize-space($str)])">
<xsl:if test="position()=$pos">
<test>
<Cell1>
<xsl:value-of select="$Cell1"/>
</Cell1>
<Cell2>
<xsl:value-of select="$str"/>
</Cell2>
</test>
</xsl:if>
</xsl:if>
</xsl:for-each>
</xsl:when>
<xsl:otherwise>
<test>
<Cell1>
<xsl:value-of select="Cell1"/>
</Cell1>
<Cell2>
<xsl:value-of select="Cell2"/>
</Cell2>
</test>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
- I have stored the cell2 that contains a single value in atomictest variable
- Check if Cell2 contains comma. if true tokenize Cell2 based on comma and check if the tokenized Cell2 value is there in atomic test -> if no then add Cell2 and Cell1 value to the output
- I would like to update the newly added Cell1 and Cell2 values in the output to the atomictest variable so that if I come through the same Cell2 value the next time I need to skip it. How to do this??
The output which I get:
<test>
<Cell1>John</Cell1>
<Cell2>A</Cell2>
</test>
<test>
<Cell1>John</Cell1>
<Cell2>B</Cell2>
</test>
<test>
<Cell1>Jade</Cell1>
<Cell2>Y</Cell2>
</test>
<test>
<Cell1>John</Cell1>
<Cell2>C</Cell2>
</test>
<test>
<Cell1>Jade</Cell1>
<Cell2>X</Cell2>
</test>
<test>
<Cell1>John</Cell1>
<Cell2>C</Cell2>
</test>
<test>
<Cell1>John</Cell1>
<Cell2>D</Cell2>
</test>
<test>
<Cell1>Jade</Cell1>
<Cell2>Y</Cell2>
</test>
Resulting output should look like the following:
<test>
<Cell1>John</Cell1>
<Cell2>A</Cell2>
</test>
<test>
<Cell1>John</Cell1>
<Cell2>B</Cell2>
</test>
<test>
<Cell1>Jade</Cell1>
<Cell2>Y</Cell2>
</test>
<test>
<Cell1>John</Cell1>
<Cell2>C</Cell2>
</test>
<test>
<Cell1>Jade</Cell1>
<Cell2>X</Cell2>
</test>
<test>
<Cell1>John</Cell1>
<Cell2>D</Cell2>
</test>
This XSLT 2.0 style-sheet...
...will transform this input...
...into...
Alternative solution
Here is an alternative single phase solution. It is simpler, but less adaptable.
Note
Both solutions rely on the following assumptions:
Variables are read-only in XSLT. That is, you can aasign them only once. After that they are read-only.