I've been working on a Geo application. Over the time the product's XML has grown bit messy. The problem arises when synchronizing the changes across multiple environments, like Dev, Test, etc. I'm trying to figure out a way to normalize the content, so I can avoid some cumbersome while editing and merging, and hence, have a productive development. I know it sounds crazy, and there's lot on the background, but let me jump to the actual issue leaving the history.
Here's the issue:
Multiple sorting orders applied, like:
- Sort based on reverse domain name. For example, it should read
d.c.b.a
asa.b.c.d
ormap.google.com
ascom.google.map
for sorting. - When the domain contains non-alphanumeric char, like *, ?, [, ], etc, then that node should be after the specific one as the scope is wide.
- Sort on port & path as 2nd subsequent sorting.
- Apply similar sorting order for tags under
<tgt>
element if present.
- Sort based on reverse domain name. For example, it should read
- Eliminate
<scheme>
and<port>
tags when the values are generic, like http / https for scheme tag and 80 or 443 for port tag, otherwise retain. Also, remove if there's no value, like<scheme/>
. - Preserve all other tag and values as-is.
- Trivial thing like indent to 2 space characters and actual data without having wanted boilerplate stuff.
Here's a bit of the problematic XML:
XML
<?xml version='1.0' encoding='UTF-8' ?>
<?tapia chrome-version='2.0' ?>
<mapGeo>
<a>blah</a>
<b>blah</b>
<maps>
<mapIndividual>
<src>
<scheme>https</scheme>
<domain>photos.yahoo.com</domain>
<path>somepath</path>
<query>blah</query>
</src>
<loc>C:\var\tmp</loc>
<x>blah</x>
<y>blah</y>
</mapIndividual>
<mapIndividual>
<src>
<scheme>tcp</scheme>
<domain>map.google.com</domain>
<port>80</port>
<path>/value</path>
<query>blah</query>
</src>
<tgt>
<scheme>https</scheme>
<domain>map.google.com</domain>
<port>443</port>
<path>/value</path>
<query>blah</query>
</tgt>
<x>blah</x>
<y>blah</y>
</mapIndividual>
<mapIndividual>
<src>
<scheme>http</scheme>
<domain>*.c.b.a</domain>
<path>somepath</path>
<port>8085</port>
<query>blah</query>
</src>
<tgt>
<domain>r.q.p</domain>
<path>somepath</path>
<query>blah</query>
</tgt>
<x>blah</x>
<y>blah</y>
</mapIndividual>
<mapIndividual>
<src>
<scheme>http</scheme>
<domain>d.c.b.a</domain>
<path>somepath</path>
<port>8085</port>
<query>blah</query>
</src>
<tgt>
<domain>r.q.p</domain>
<path>somepath</path>
<query>blah</query>
</tgt>
<x>blah</x>
<y>blah</y>
</mapIndividual>
<maps>
</mapGeo>
I was able to apply basic sorting on the values as is, but couldn't figure out a way to generate reverse domain name. I came across XSL extension, but haven't tried yet. Here's the beginning part of the solution I was working on, which is very basic.
XSL
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="node()">
<xsl:copy>
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="maps">
<xsl:copy>
<xsl:apply-templates select="*">
<xsl:sort select="src/domain" />
<xsl:sort select="src/port" />
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Expected Output
<?xml version='1.0' encoding='UTF-8' ?>
<?tapia chrome-version='2.0' ?>
<mapGeo>
<a>blah</a>
<b>blah</b>
<maps>
<mapIndividual>
<src>
<domain>d.c.b.a</domain>
<path>somepath</path>
<port>8085</port>
<query>blah</query>
</src>
<tgt>
<domain>r.q.p</domain>
<path>somepath</path>
<query>blah</query>
</tgt>
<x>blah</x>
<y>blah</y>
</mapIndividual>
<mapIndividual>
<src>
<domain>*.c.b.a</domain>
<path>path1</path>
<port>8085</port>
<query>blah</query>
</src>
<tgt>
<domain>r.q.p</domain>
<path>path2</path>
<query>blah</query>
</tgt>
<x>blah</x>
<y>blah</y>
</mapIndividual>
<mapIndividual>
<src>
<scheme>tcp</scheme>
<domain>map.google.com</domain>
<path>/value</path>
<query>blah</query>
</src>
<tgt>
<domain>map.google.com</domain>
<path>/value</path>
<query>blah</query>
</tgt>
<x>blah</x>
<y>blah</y>
</mapIndividual>
<mapIndividual>
<src>
<domain>photos.yahoo.com</domain>
<path>somepath</path>
<query>blah</query>
</src>
<loc>C:\var\tmp</loc>
<x>blah</x>
<y>blah</y>
</mapIndividual>
<maps>
</mapGeo>
Note: I'd prefer XSLT 1.0 as that's supported in the current environment. XSLT 2.0 would be a plus.
Update: I figured out solution to support XSLT 2.0 and XSLT 3.0, so please ignore my previous note for XSLT 1.0.
Thank you in Advance!
Cheers,
This XSLT 1.0 stylesheet (without extensions)
Output
Do note: this is ussing the fact that
.
(dot) precedes and~
follows (tilde) letters in alphabetical order (at least for US). Also might (sic) not scale well...I'm with Martin Honnen comment: this would be better solved in XSLT 2.0
I don't think it's possible to sort in the reverse order you seek in a single pass using XSLT 1.0. Consider the following simplified example:
XML
XSLT 1.0 (+ EXSLT node-set)
Result