Remove empty XML tags

I am looking for a good approach that can remove empty tags from XML efficiently. What do you recommend? Regex? XDocument? XmlTextReader?

For example,

const string original = 
    @"<?xml version=""1.0"" encoding=""utf-16""?>
    <pet>
        <cat>Tom</cat>
        <pig />
        <dog>Puppy</dog>
        <snake></snake>
        <elephant>
            <africanElephant></africanElephant>
            <asianElephant>Biggy</asianElephant>
        </elephant>
        <tiger>
            <tigerWoods></tigerWoods>       
            <americanTiger></americanTiger>
        </tiger>
    </pet>";

Could become:

const string expected = 
    @"<?xml version=""1.0"" encoding=""utf-16""?>
        <pet>
        <cat>Tom</cat>
        <dog>Puppy</dog>        
        <elephant>                                              
            <asianElephant>Biggy</asianElephant>
        </elephant>                                 
    </pet>";

标签： c# .net xml linq-to-xml

6条回答

放我归山

2楼-- · 2019-01-18 04:57

This is meant to be an improvement on the accepted answer to handle attributes:

XDocument xd = XDocument.Parse(original);
xd.Descendants()
    .Where(e => (e.Attributes().All(a => a.IsNamespaceDeclaration || string.IsNullOrWhiteSpace(a.Value))
            && string.IsNullOrWhiteSpace(e.Value)
            && e.Descendants().SelectMany(c => c.Attributes()).All(ca => ca.IsNamespaceDeclaration || string.IsNullOrWhiteSpace(ca.Value))))
    .Remove();

The idea here is to check that all attributes on an element are also empty before removing it. There is also the case that empty descendants can have non-empty attributes. I inserted a third condition to check that the element has all empty attributes among its descendants. Considering the following document with node8 added:

<root>
  <node />
  <node2 blah='' adf='2'></node2>
  <node3>
    <child />
  </node3>
  <node4></node4>
  <node5><![CDATA[asdfasdf]]></node5>
  <node6 xmlns='urn://blah' d='a'/>
  <node7 xmlns='urn://blah2' />
  <node8>
     <child2 d='a' />
  </node8>
</root>

This would become:

<root>
  <node2 blah="" adf="2"></node2>
  <node5><![CDATA[asdfasdf]]></node5>
  <node6 xmlns="urn://blah" d="a" />
  <node8>
    <child2 d='a' />
  </node8>
</root>

The original and improved answer to this question would lose the node2 and node6 and node8 nodes. Checking for e.IsEmpty would work if you only want to strip out nodes like <node />, but it's redunant if you're going for both <node /> and <node></node>. If you also need to remove empty attributes, you could do this:

xd.Descendants().Attributes().Where(a => string.IsNullOrWhiteSpace(a.Value)).Remove();
xd.Descendants()
  .Where(e => (e.Attributes().All(a => a.IsNamespaceDeclaration))
            && string.IsNullOrWhiteSpace(e.Value))
  .Remove();

which would give you:

<root>
  <node2 adf="2"></node2>
  <node5><![CDATA[asdfasdf]]></node5>
  <node6 xmlns="urn://blah" d="a" />
</root>

0人赞添加讨论(0) 举报

相关推荐>>

3楼-- · 2019-01-18 04:58

Loading your original into an XDocument and using the following code gives your desired output:

var document = XDocument.Parse(original);
document.Descendants()
        .Where(e => e.IsEmpty || String.IsNullOrWhiteSpace(e.Value))
        .Remove();

0人赞添加讨论(0) 举报

▲ chillily

4楼-- · 2019-01-18 04:58

As always, it depends on your requirements.

Do you know how the empty tag will display? (e.g. <pig />, <pig></pig>, etc.) I usually do not recommend using Regular Expressions (they are really useful but at the same time they are evil). Also considering a string.Replace approach seems to be problematic unless your XML doesn't have a certain structure.

Finally, I would recommend using an XML parser approach (make sure your code is valid XML).

var doc = XDocument.Parse(original);
var emptyElements = from descendant in doc.Descendants()
                    where descendant.IsEmpty || string.IsNullOrWhiteSpace(descendant.Value)
                    select descendant;
emptyElements.Remove();

0人赞添加讨论(0) 举报

Melony?

5楼-- · 2019-01-18 05:12

XmlTextReader is preferable if we are talking about performance (it provides fast, forward-only access to XML). You can determine if tag is empty using XmlReader.IsEmptyElement property.

XDocument approach which produces desired output:

public static bool IsEmpty(XElement n)
{
    return n.IsEmpty 
        || (string.IsNullOrEmpty(n.Value) 
            && (!n.HasElements || n.Elements().All(IsEmpty)));
}

var doc = XDocument.Parse(original);
var emptyNodes = doc.Descendants().Where(IsEmpty);
foreach (var emptyNode in emptyNodes.ToArray())
{
    emptyNode.Remove();
}

0人赞添加讨论(0) 举报

▲ chillily

6楼-- · 2019-01-18 05:14

Anything you use will have to pass through the file once at least. If its just a single named tag that you know then regex is your friend otherwise use a stack approach. Start with parent tag and if it has a sub tag place it in stack. If you find an empty tag remove it then once you have gone through child tags and reached the ending tag of what you have on top of stack then pop it and check it as well. If its empty remove it as well. This way you can remove all empty tags including tags with empty children.

If you are after a reg ex expression use this

0人赞添加讨论(0) 举报

三岁会撩人

7楼-- · 2019-01-18 05:23

XDocument is probably simplest to implement, and will give adequate performance if you know your documents are reasonably small.

XmlTextReader will be faster and use less memory than XDocument when processing very large documents.

Regex is best for handling text rather than XML. It might not handle all edge cases as you would like (e.g. a tag within a CDATA section; a tag with an xmlns attribute), so is probably not a good idea for a general implementation, but may be adequate depending on how much control you have of the input XML.

0人赞添加讨论(0) 举报

Remove empty XML tags

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间