-->

Remove empty nodes from a XML recursively

2019-05-09 22:37发布

问题:

I want to delete the empty nodes from an XML element. This xml is generated from a vendor and i dont have control on xml generation. But since the XML has few empty nodes i need to delete those empty nodes recursively.

This xml is got from OMElement and i get an Element from this object using [XMLUtils][1] Sample XML

<A>
  <B>
    <C>
      <C1>
        <C11>something</C11>
        <C12>something</C12>
      </C1>
    </C>
    <D>
      <D1>
        <D11>
          <D111 operation="create">
            <Node>something else</Node>
          </D11>
        </D11>
      </D1>
      <D2>
        <D21>

        </D21>
      </D2>
    </D>
  </B>
</A> 

Since D21 is an empty node i want to delete D21 and since now D2 is an empty node i want to delete D2 but since D has D1 i dont want to delete D.

Similarly it is possible that i can get

<A>
  <B>
    <C>

    </C>
  </B>
</A>

Now since C is empty i want to delete C and then B and then eventually node A. I am trying to do this using removeChild() method in Node

But so far i am unable to remove them recursively. Any suggestions to remove them recursively?

I am recursively trying to get node and node length. But node length is of no help

if(childNode.getChildNodes().getLength() == 0 ){
       childNode.getParentNode().removeChild(childNode);

               }

Regards
Dheeraj Joshi

回答1:

This works, just create a recursive function that "goes deep" first, then removes empty nodes on the way "back up the tree", this will have the effect of removing both D21 and D2.

public static void main(String[] args) throws Exception {

    DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
    String input = "<A><B><C><C1><C11>something</C11><C12>something</C12></C1></C><D><D1><D11><D111 operation=\"create\"><Node>something else</Node></D111></D11></D1><D2><D21></D21></D2></D></B></A>";

    Document document = builder.parse(new InputSource(new StringReader(
            input)));

    removeNodes(document);

    Transformer transformer = TransformerFactory.newInstance()
            .newTransformer();
    transformer.setOutputProperty(OutputKeys.INDENT, "yes");
    StreamResult result = new StreamResult(new StringWriter());
    transformer.transform(new DOMSource(document), result);
    System.out.println(result.getWriter().toString());
}

public static void removeNodes(Node node) {
    NodeList list = node.getChildNodes();
    for (int i = 0; i < list.getLength(); i++) {
        removeNodes(list.item(i));
    }
    boolean emptyElement = node.getNodeType() == Node.ELEMENT_NODE
            && node.getChildNodes().getLength() == 0;
    boolean emptyText = node.getNodeType() == Node.TEXT_NODE
            && node.getNodeValue().trim().isEmpty();
    if (emptyElement || emptyText) {
        node.getParentNode().removeChild(node);
    }
}

Output

<A>
<B>
<C>
<C1>
<C11>something</C11>
<C12>something</C12>
</C1>
</C>
<D>
<D1>
<D11>
<D111 operation="create">
<Node>something else</Node>
</D111>
</D11>
</D1>
</D>
</B>
</A>


回答2:

I don't have enough rep to comment on @Adam's solution, but I was having an issue where after a node removal, the last sibling of that node was moved to index zero, causing it to not fully remove empty elements. The fix was to use a list to hold all of the nodes we want to recursively call for removal.

Also, there was a bug that removed empty elements that had attributes.

Solution to both issues:

public static void removeEmptyNodes(Node node) {

    NodeList list = node.getChildNodes();
    List<Node> nodesToRecursivelyCall = new LinkedList();

    for (int i = 0; i < list.getLength(); i++) {
        nodesToRecursivelyCall.add(list.item(i));
    }

    for(Node tempNode : nodesToRecursivelyCall) {
        removeEmptyNodes(tempNode);
    }

    boolean emptyElement = node.getNodeType() == Node.ELEMENT_NODE 
          && node.getChildNodes().getLength() == 0;
    boolean emptyText = node.getNodeType() == Node.TEXT_NODE 
          && node.getNodeValue().trim().isEmpty();

    if (emptyElement || emptyText) {
        if(!node.hasAttributes()) {
            node.getParentNode().removeChild(node);
        }
    }

}


回答3:

Use getTextContent() on top-level element of DOM. If method return empty string or null, you can removed this node, because this node and all child nodes is empty. If method getTextContent() return not empty string, call getTextContent on every child of current node, and so on.
See documentation.



回答4:

public class RemoveEmprtElement {

public static void main(String[] args) {
    ReadFile readFile =new ReadFile();
    String strXml=readFile.readFileFromPath(new File("sampleXml4.xml"));
    RemoveEmprtElement elementEmprtElement=new RemoveEmprtElement();
    DocumentBuilder dBuilder = null;
    Document doc = null;
    try {
        dBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
        doc = dBuilder.parse(new ByteArrayInputStream(strXml.getBytes()));

        elementEmprtElement.getEmptyNodes(doc);
        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer trans = tf.newTransformer();
        StreamResult result = new StreamResult(new StringWriter());
        trans.transform(new DOMSource(doc), result);
        System.out.println(result.getWriter().toString());

    }catch(Exception e) {
        e.printStackTrace();
    }
}

private void getEmptyNodes(Document doc){

    try {
        XPathFactory factory = XPathFactory.newInstance();
        XPath xpath = factory.newXPath();
        XPathExpression expr = xpath.compile("//*[not(*)]");
        Object resultNS = expr.evaluate(doc, XPathConstants.NODESET);
        NodeList nodes = (NodeList) resultNS;
        for(int i =0 ; i < nodes.getLength() ; i++){
            Node node = nodes.item(i);
            boolean emptyElement = node.getNodeType() == Node.ELEMENT_NODE
                    && node.getChildNodes().getLength() == 0;
            boolean emptyText = node.getNodeType() == Node.TEXT_NODE
                    && node.getNodeValue().trim().isEmpty();

            if (emptyElement || emptyText) {
                xmlNodeRemove(doc,findPath(node));
                getEmptyNodes(doc);
            }
        } 
    }catch(Exception e) {
        e.printStackTrace();
    }

}

private void xmlNodeRemove(Document doc,String xmlNodeLocation){

    try {
        XPathFactory factory = XPathFactory.newInstance();
        XPath xpath = factory.newXPath();
        XPathExpression expr = xpath.compile(xmlNodeLocation);
        Object resultNS = expr.evaluate(doc, XPathConstants.NODESET);
        NodeList nodes = (NodeList) resultNS;
        Node node =nodes.item(0);
        if(node!=null && node.getParentNode()!=null && node.getParentNode().hasChildNodes()){
        node.getParentNode().removeChild(node);
        }
    }catch(Exception e) {
        e.printStackTrace();
    }
}

private String findPath(Node n) {
    String path="";
    if(n==null){
        return path;
    }else if(n.getNodeName().equals("#document")){
        return "";
    }
        else{
            path=n.getNodeName();
            path=findPath(n.getParentNode())+"/"+path;
        }
        return path;
    }

}


回答5:

Just work with strings:

    Pattern emptyValueTag = Pattern.compile("\\s*<\\w+/>");
    Pattern emptyTagMultiLine = Pattern.compile("\\s*<\\w+>\n*\\s*</\\w+>");

    xml = emptyValueTag.matcher(xml).replaceAll("");

    while (xml.length() != (xml = emptyTagMultiLine.matcher(xml).replaceAll("")).length()) {
    }

    return xml;


标签: java xml xmlnode