I have a large (200k lines) XML file as a report from a tool (reporting on a VSS database). It consists of a large number of <file>
elements like this:
<file>
<name>file.bat</name>
<version>111</version>
<checkedout>No</checkedout>
<binary>Text</binary>
<vss_path>$/Code/file.bat</vss_path>
<original_path>C:\code\file.bat</original_path>
<action>Labeled '1.2.3.4'</action>
<date>27/09/2013 09:08:00</date>
<comment></comment>
<label>1.2.3.4</label>
<label_comment></label_comment>
<user>John</user>
<shared_links>
<shared_link>$/Beta_1</shared_link>
<shared_link>$/Branches/New_Feature</shared_link>
</shared_links>
</file>
I want to find only the <file>
elements which have at least one <shared_link>
starting with/prefixed by "$/Beta".
In an ideal world, all I want for each matching element are the <name>
, <vss_path>
and (matching) <shared_link>
parts, but that's not exactly important.
I'm not well-versed in XSLT/XPATH but believe those can do something like this?
This XSLT stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="root">
<xsl:copy>
<xsl:apply-templates select="file[shared_links[shared_link[starts-with(., '$/Beta')]]]"/>
</xsl:copy>
</xsl:template>
<xsl:template match="file">
<xsl:copy>
<xsl:apply-templates select="name | vss_path | shared_links"/>
</xsl:copy>
</xsl:template>
<xsl:template match="shared_links">
<xsl:copy>
<xsl:apply-templates select="shared_link[starts-with(., '$/Beta')]"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
when applied to this input XML (as yours but with an extra, non-matching file added):
<root>
<file>
<name>file.bat</name>
<version>111</version>
<checkedout>No</checkedout>
<binary>Text</binary>
<vss_path>$/Code/file.bat</vss_path>
<original_path>C:\code\file.bat</original_path>
<action>Labeled '1.2.3.4'</action>
<date>27/09/2013 09:08:00</date>
<comment></comment>
<label>1.2.3.4</label>
<label_comment></label_comment>
<user>John</user>
<shared_links>
<shared_link>$/Alpha_1</shared_link>
<shared_link>$/Branches/New_Feature</shared_link>
</shared_links>
</file>
<file>
<name>file.bat</name>
<version>111</version>
<checkedout>No</checkedout>
<binary>Text</binary>
<vss_path>$/Code/file.bat</vss_path>
<original_path>C:\code\file.bat</original_path>
<action>Labeled '1.2.3.4'</action>
<date>27/09/2013 09:08:00</date>
<comment></comment>
<label>1.2.3.4</label>
<label_comment></label_comment>
<user>John</user>
<shared_links>
<shared_link>$/Beta_1</shared_link>
<shared_link>$/Branches/New_Feature</shared_link>
</shared_links>
</file>
</root>
produces the following output XML:
<root>
<file>
<name>file.bat</name>
<vss_path>$/Code/file.bat</vss_path>
<shared_links>
<shared_link>$/Beta_1</shared_link>
</shared_links>
</file>
</root>
Use
<xsl:template match="/">
<xsl:apply-templates select="//file[shared_links/shared_link[starts-with(., '$/Beta')]]"/>
</xsl:template>
<xsl:template match="file">
<xsl:copy-of select="name | vss_path | shared_links/shared_link"/>
</xsl:template>
to output those elements. That way however the result is an XML fragment with multiple top level elements, if you want an XML document then change the first template to
<xsl:template match="/">
<root>
<xsl:apply-templates select="//file[shared_links/shared_link[starts-with(., '$/Beta')]]"/>
</root>
</xsl:template>