I have extracted some code from codeproject to reindent an XML document. Does anyone know how I can modify the stylesheet to make it so that the transform of an XML file will result in empty tags showing up as <tag />
instead of <tag></tag>
?
// http://www.codeproject.com/Articles/43309/How-to-create-a-simple-XML-file-using-MSXML-in-C
MSXML2::IXMLDOMDocumentPtr FormatDOMDocument(MSXML2::IXMLDOMDocumentPtr pDoc)
{
LPCSTR const static szStyleSheet =
R"!(<?xml version="1.0" encoding="utf-8"?>)!"
R"!(<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">)!"
R"!( <xsl:output method="xml" indent="yes"/>)!"
R"!( <xsl:template match="@* | node()">)!"
R"!( <xsl:copy>)!"
R"!( <xsl:apply-templates select="@* | node()"/>)!"
R"!( </xsl:copy>)!"
R"!( </xsl:template>)!"
R"!(</xsl:stylesheet>)!";
MSXML2::IXMLDOMDocumentPtr pXmlStyleSheet;
pXmlStyleSheet.CreateInstance(__uuidof(MSXML2::DOMDocument60));
pXmlStyleSheet->loadXML(szStyleSheet);
MSXML2::IXMLDOMDocumentPtr pXmlFormattedDoc;
pXmlFormattedDoc.CreateInstance(__uuidof(MSXML2::DOMDocument60));
CComPtr<IDispatch> pDispatch;
HRESULT hr = pXmlFormattedDoc->QueryInterface(IID_IDispatch, (void**)&pDispatch);
if (SUCCEEDED(hr))
{
_variant_t vtOutObject;
vtOutObject.vt = VT_DISPATCH;
vtOutObject.pdispVal = pDispatch;
vtOutObject.pdispVal->AddRef();
hr = pDoc->transformNodeToObject(pXmlStyleSheet, vtOutObject);
}
//By default it is writing the encoding = UTF-16. Let us change the encoding to UTF-8
// <?xml version="1.0" encoding="UTF-8"?>
MSXML2::IXMLDOMNodePtr pXMLFirstChild = pXmlFormattedDoc->GetfirstChild();
// A map of the a attributes (vesrsion, encoding) values (1.0, UTF-8) pair
MSXML2::IXMLDOMNamedNodeMapPtr pXMLAttributeMap = pXMLFirstChild->Getattributes();
MSXML2::IXMLDOMNodePtr pXMLEncodNode = pXMLAttributeMap->getNamedItem(_T("encoding"));
pXMLEncodNode->PutnodeValue(_T("UTF-8")); //encoding = UTF-8
return pXmlFormattedDoc;
}
This stylesheet causes empty tags to be written where possible (with MSXML6):
This is achieved by avoiding the
xsl:copy
for elements with no child elements, text, comments or processing-instructions, and "manually" copying the element usingxsl:element
. Note that the attributes are copied too with the nestedxsl:copy-of
.For example, this XML document:
would be transformed into the following using your
FormatDOMDocument
function, with the updated stylesheet:To restrict empty tags to only certain elements by name, you can adjust the
match
pattern to add a check on the element name:contains('|list|of|element|names|', concat('|',name(),'|'))
. Note that that list of names is separated with a|
, and there's a|
at the start and end of the list too, and we concatenate the element name with those delimiters as well. This trick enables us usecontains
(which just matches any substring) to achieve the effect of searching in the list.For example, allowing empty tags for the
non-empty
,empty-2
,empty-4
andabc:empty-with-namespace
elements in my previous example, the updated stylesheet would be:and the output of
FormatDOMDocument
would become:Note that though we specified
non-empty
as a possible empty tag in that list, that it doesn't come out as empty, because it actually has a text node (which is what we want). Also, note thatempty
wasn't in our list, and it comes out with a closing tag as<empty></empty>
which was what we wanted in this case too (similarly forempty-3
).