Merge multiple XML files at group levels with XSLT

2019-05-23 11:55发布

First, let me say that I have enjoyed reading dozens of tips about merging multiple XML files. I've also enjoyed implementing a good number of them. But I still haven't achieved my goal.

I don't want to simply merge XML files so that one is repeated after another in the resulting XML file. I have groups with repeating elements that need to each be merged:

<SAN>
  <EQLHosts>
    <WindowsHosts>
      <WindowsHost>
        more data and structures down here...
      </WindowsHost>
    </WindowsHosts>
    <LinuxHosts>
      <LinuxHost>
        ...and here...
      </LinuxHost>
    </LinuxHosts>
  </EQLHosts>
</SAN>

Each of the individual XML files might have Windows and/or Linux hosts. So if XML file 1 has data for Windows host A, B and C, and XML file 2 has data for Windows hosts D, E and F, the resulting XML should look like:

<SAN>
  <EQLHosts>
    <WindowsHosts>
      <WindowsHost>
        <Name>A</Name>
      </WindowsHost>
      <WindowsHost>
        <Name>B</Name>
      </WindowsHost>
      <WindowsHost>
        <Name>C</Name>
      </WindowsHost>
      <WindowsHost>
        <Name>D</Name>
      </WindowsHost>
      <WindowsHost>
        <Name>E</Name>
      </WindowsHost>
      <WindowsHost>
        <Name>F</Name>
      </WindowsHost>
    </WindowsHosts>
    <LinuxHosts>
      <LinuxHost/>
    </LinuxHosts>
  </EQLHosts>
</SAN>

I have used this XSLT, among others, to get this to work:

<?xml version="1.0"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:variable name="file1" select="document('CorralData1.xml')"/>
  <xsl:variable name="file2" select="document('CorralData2.xml')"/>
  <xsl:variable name="file3" select="document('CorralData3.xml')"/>

  <xsl:template match="/">
    <SAN>
      <xsl:copy-of select="/SAN/*"/>
      <xsl:copy-of select="$file1/SAN/*"/>
      <xsl:copy-of select="$file2/SAN/*"/>
      <xsl:copy-of select="$file3/SAN/*"/>
    </SAN>
  </xsl:template>

</xsl:stylesheet>

This file produces a combined XSLT, with all data all the way down the tree included correctly, but with multiple instances of WindowsHosts. Don't want that.

Is there a way to tell XSLT how to do this with a minimum of syntax, or do I need to add each element and sub-element specifically in the XSLT file?


I should have checked. But I went ahead and used collection() and got a solution to work perfectly using the Saxon HE XSLT processor.

But I'm running in an InfoPath environment, and there's only an XSLT 1.0 processor. Does anyone have a recommendation for replacing the collection() command in an XSLT 1.0 environment? Can I go back to using document() in some way?


So I now have this file...

<?xml version="1.0" encoding="windows-1252"?>

<files>
    <file name="CorralData1.xml"/>
    <file name="CorralData2.xml"/>
</files>

...which I use with a stylesheet containing...

<xsl:variable name="windowsHosts" select="/SAN/WindowsHosts/WindowsHost"/>
<xsl:variable name="vmwareHosts" select="/SAN/VMwareHosts/VMwareHost"/>
<xsl:variable name="linuxHosts" select="/SAN/LinuxHosts/LinuxHost"/>

<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="/">
    <xsl:for-each select="/files/file">
        <xsl:apply-templates select="document(@name)/SAN"/>
    </xsl:for-each>
    <SAN>
        <EQLHosts>
            <WindowsHosts>
                <xsl:for-each select="$windowsHosts">
                    <xsl:copy-of select="."/>
                </xsl:for-each>
            </WindowsHosts>
            <VMwareHosts>
                <xsl:for-each select="$vmwareHosts">
                    <xsl:copy-of select="."/>
                </xsl:for-each>                 
            </VMwareHosts>
            <LinuxHosts>
                <xsl:for-each select="$linuxHosts">
                    <xsl:copy-of select="."/>
                </xsl:for-each>                 
            </LinuxHosts>
        </EQLHosts>
    </SAN>
</xsl:template>

...but this gets me multiple /SAN roots. I'm close but something's still a little off.

标签: xml xslt
2条回答
我欲成王,谁敢阻挡
2楼-- · 2019-05-23 12:26

I used two XSLT files for this operation. The first simply appends all the files:

<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="/">
    <SAN>
        <xsl:apply-templates select="document('MainDataSource.xml')/SAN/*"/>
        <xsl:apply-templates select="document('CorralData1.xml')/SAN/*"/>
        <xsl:apply-templates select="document('CorralData2.xml')/SAN/*"/>
    </SAN>
</xsl:template>

and the second merges the data by group:

<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="*">
    <SAN>
        <ClientProfile>
        </ClientProfile>
        <STACKMEMBERS>
            <xsl:for-each select="/SAN/STACKMEMBERS/STACKMEMBER">
                <xsl:copy-of select="."/>
            </xsl:for-each>
        </STACKMEMBERS>
        <Force10StackMembers>
            <xsl:for-each select="/SAN/Force10StackMembers/Force10StackMember">
                <xsl:copy-of select="."/>
            </xsl:for-each>
        </Force10StackMembers>
    </SAN>
</xsl:template>
查看更多
Deceive 欺骗
3楼-- · 2019-05-23 12:30

What I would do is use distinct-values() to get each unique host name. You could also use collection() to make it a little easier. (Usage may differ depending on the implementation. I used Saxon 9.4.)

Example...

Input files in the directory "input_dir"...

CorralData1.xml

<SAN>
    <EQLHosts>
        <WindowsHosts>
            <WindowsHost>
                <Name>Windows-A</Name>
            </WindowsHost>
            <WindowsHost>
                <Name>Windows-B</Name>
            </WindowsHost>
        </WindowsHosts>
        <LinuxHosts>
            <LinuxHost>
                <Name>Linux-A</Name>
            </LinuxHost>
            <LinuxHost>
                <Name>Linux-B</Name>
            </LinuxHost>
        </LinuxHosts>
    </EQLHosts>
</SAN>

CorralData2.xml (Windows-A and Windows-B are repeated)

<SAN>
    <EQLHosts>
        <WindowsHosts>
            <WindowsHost>
                <Name>Windows-C</Name>
            </WindowsHost>
            <WindowsHost>
                <Name>Windows-D</Name>
            </WindowsHost>
            <WindowsHost>
                <Name>Windows-A</Name>
            </WindowsHost>
            <WindowsHost>
                <Name>Windows-B</Name>
            </WindowsHost>
        </WindowsHosts>
        <LinuxHosts>
            <LinuxHost>
                <Name>Linux-C</Name>
            </LinuxHost>
            <LinuxHost>
                <Name>Linux-D</Name>
            </LinuxHost>
        </LinuxHosts>
    </EQLHosts>
</SAN>

CorralData3.xml (Windows-A and Windows-B are repeated)

<SAN>
    <EQLHosts>
        <WindowsHosts>
            <WindowsHost>
                <Name>Windows-E</Name>
            </WindowsHost>
            <WindowsHost>
                <Name>Windows-F</Name>
            </WindowsHost>
            <WindowsHost>
                <Name>Windows-A</Name>
            </WindowsHost>
            <WindowsHost>
                <Name>Windows-B</Name>
            </WindowsHost>          
        </WindowsHosts>
        <LinuxHosts>
            <LinuxHost>
                <Name>Linux-E</Name>
            </LinuxHost>
            <LinuxHost>
                <Name>Linux-F</Name>
            </LinuxHost>
        </LinuxHosts>
    </EQLHosts>
</SAN>

XSLT 2.0

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:variable name="collection">
        <xsl:copy-of select="collection('input_dir?strip-space=yes;select=*.xml')/*"/>
    </xsl:variable>
    <xsl:variable name="windowsHosts" select="distinct-values($collection/SAN/EQLHosts/WindowsHosts/WindowsHost/Name)"/>
    <xsl:variable name="linuxHosts" select="distinct-values($collection/SAN/EQLHosts/LinuxHosts/LinuxHost/Name)"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="/">
        <SAN>
            <EQLHosts>
                <WindowsHosts>
                    <xsl:for-each select="$windowsHosts">
                        <xsl:apply-templates select="($collection/SAN/EQLHosts/WindowsHosts/WindowsHost[Name=current()])[1]"/>
                    </xsl:for-each>
                </WindowsHosts>
                <LinuxHosts>
                    <xsl:for-each select="$linuxHosts">
                        <xsl:apply-templates select="($collection/SAN/EQLHosts/LinuxHosts/LinuxHost[Name=current()])[1]"/>
                    </xsl:for-each>                 
                </LinuxHosts>
            </EQLHosts>
        </SAN>
    </xsl:template>

</xsl:stylesheet>

Output

<SAN>
    <EQLHosts>
        <WindowsHosts>
            <WindowsHost>
                <Name>Windows-A</Name>
            </WindowsHost>
            <WindowsHost>
                <Name>Windows-B</Name>
            </WindowsHost>
            <WindowsHost>
                <Name>Windows-C</Name>
            </WindowsHost>
            <WindowsHost>
                <Name>Windows-D</Name>
            </WindowsHost>
            <WindowsHost>
                <Name>Windows-E</Name>
            </WindowsHost>
            <WindowsHost>
                <Name>Windows-F</Name>
            </WindowsHost>
        </WindowsHosts>
        <LinuxHosts>
            <LinuxHost>
                <Name>Linux-A</Name>
            </LinuxHost>
            <LinuxHost>
                <Name>Linux-B</Name>
            </LinuxHost>
            <LinuxHost>
                <Name>Linux-C</Name>
            </LinuxHost>
            <LinuxHost>
                <Name>Linux-D</Name>
            </LinuxHost>
            <LinuxHost>
                <Name>Linux-E</Name>
            </LinuxHost>
            <LinuxHost>
                <Name>Linux-F</Name>
            </LinuxHost>
        </LinuxHosts>
    </EQLHosts>
</SAN>
查看更多
登录 后发表回答