So I've got an XML file I've generated from a php curl response that is then transformed to CSV such that each mods element below is one line. I've got some CSV using the stylesheet in the checked answer here , but it's not quite what I'm trying to do.
My XML (simplified):
<xml>
<mods xmlns="http://www.loc.gov/mods/">
<typeOfResource>StillImage</typeOfResource>
<titleInfo ID="T-1">
<title>East Bay Street</title>
</titleInfo>
<subject ID="SBJ-2">
<topic>Railroads</topic>
</subject>
<subject ID="SBJ-3">
<geographic>Low Country</geographic>
</subject>
<subject ID="SBJ-4">
<geographic>Charleston (S.C.)</geographic>
</subject>
<subject ID="SBJ-7">
<hierarchicalGeographic>
<county>Charleston County (S.C.)</county>
</hierarchicalGeographic>
</subject>
<physicalDescription>
<form>Images</form>
</physicalDescription>
<note>Caption: 'War Views. No.179. Ruins of the Northeastern Railway Depot, Charleston.' This is a stereograph image which measures 3 1/2" X 7". Date assumed to be 1865.</note>
<originInfo>
<dateCreated>1865</dateCreated>
</originInfo>
<location>
<physicalLocation>The Charleston Museum Archives</physicalLocation>
</location>
<relatedItem type="host">
<titleInfo>
<title>Charleston Museum Civil War Photographs</title>
</titleInfo>
</relatedItem>
</mods>
<mods>
more nodes...
</mods>
</xml>
My current XSL from the stack post above:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="iso-8859-1"/>
<xsl:strip-space elements="*" />
<xsl:template match="/*/child::*">
<xsl:for-each select="child::*">
<xsl:if test="position() != last()"><xsl:value-of select="normalize-space(.)"/>, </xsl:if>
<xsl:if test="position() = last()"><xsl:value-of select="normalize-space(.)"/> <xsl:text>
</xsl:text>
</xsl:if>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
This outputs CSV where each MODS element is one line, and each child is a comma separated value on that line. Would it be possible to modify the XSL such that each MODS element is one line, but the values of matching children are grouped? Something like:
StillImage,East Bay Street,Railroads,**Low County;Charleston (S.C.)**,Charleston County (S.C.), Images
.......and so on.
So when nodes (like the multiple subject -> geographic entries) match they are grouped and semicolon separated rather than taking up multiple comma separated values? Hopefully I'm making some sense. Thanks!
One way to do this is firstly change your XSLT to only select the elements which do not have a preceding-sibling with the same child name (i.e select elements that are the 'first' in each group)
Then, you can define a variable to get the following sibling if (and only if) it has the same name, so you can then check if the current element is indeed in a group of more than 1.
(I am not sure if you actually wanted the ** in the final results, or whether they are just there to highlight the group! I am keeping them in my example, but obviously it will be easy enough to remove the relevant lines of code).
To group together the following-siblings with the same name, you could call a recursive template for the first following-sibling
Then, within this template you would recursively call it where the immediate following sibling has the same name
Try the following XSLT
Now, if you could use XSLT 2.0, things become much, much easier, as you could use the xsl:for-each-group construct which, among other things, comes with an operation to 'group-adjacent'. And you could also do away with the recursive template by using the improved xsl:value-of which would have a 'separator' property to use when multiple elements are select.
For XSLT 2.0, the following should also work