How to get element only elements with values Stax

2019-07-25 19:07发布

问题:

I'm trying to get only elements that have text, ex xml :

<root>
      <Item>
        <ItemID>4504216603</ItemID>
        <ListingDetails>
          <StartTime>10:00:10.000Z</StartTime>
          <EndTime>10:00:30.000Z</EndTime>
          <ViewItemURL>http://url</ViewItemURL>
            ....
           </item> 

It should print

Element Local Name:ItemID
Text:4504216603
Element Local Name:StartTime
Text:10:00:10.000Z
Element Local Name:EndTime
Text:10:00:30.000Z
Element Local Name:ViewItemURL
Text:http://url

This code prints also root, item etc. Is it even possible, it must be I just can't google it.

XMLInputFactory inputFactory = XMLInputFactory.newInstance();
InputStream input = new FileInputStream(new File("src/main/resources/file.xml"));
XMLStreamReader xmlStreamReader = inputFactory.createXMLStreamReader(input);

while (xmlStreamReader.hasNext()) {
    int event = xmlStreamReader.next();

    if (event == XMLStreamConstants.START_ELEMENT) {
    System.out.println("Element Local Name:" + xmlStreamReader.getLocalName());
    }

    if (event == XMLStreamConstants.CHARACTERS) {
                        if(!xmlStreamReader.getText().trim().equals("")){
                        System.out.println("Text:"+xmlStreamReader.getText().trim());
                        }
                }

            }

Edit incorrect behaviour :

    Element Local Name:root
    Element Local Name:item
    Element Local Name:ItemID
    Text:4504216603
    Element Local Name:ListingDetails
    Element Local Name:StartTime
    Text:10:00:10.000Z
    Element Local Name:EndTime
    Text:10:00:30.000Z
    Element Local Name:ViewItemURL
    Text:http://url

I don't want that root and other nodes which don't have text to be printed, just the output which I wrote above. thank you

回答1:

Try this:

while (xmlStreamReader.hasNext()) {
    int event = xmlStreamReader.next();

    if (event == XMLStreamConstants.START_ELEMENT) {
        try {
            String text = xmlStreamReader.getElementText();
            System.out.println("Element Local Name:" + xmlStreamReader.getLocalName());
            System.out.println("Text:" + text);
        } catch (XMLStreamException e) {

        }
    }

}

SAX based solution (works):

public class Test extends DefaultHandler {

    public static void main(String[] args) throws ParserConfigurationException, IOException, SAXException, XPathExpressionException, XMLStreamException {
        SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
        parser.parse(new File("src/file.xml"), new Test());
    }

    private String currentName;

    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
        currentName = qName;
    }

    @Override
    public void characters(char[] ch, int start, int length) throws SAXException {
        String string = new String(ch, start, length);
        if (hasText(string)) {
            System.out.println(currentName);
            System.out.println(string);
        }
    }

    private boolean hasText(String string) {
        string = string.trim();
        return string.length() > 0;
    }
}


回答2:

Stax solution :

Parse document

public void parseXML(InputStream xml) {
        try {

            DOMResult result = new DOMResult();
            XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
            XMLEventReader reader = xmlInputFactory.createXMLEventReader(new StreamSource(xml));
            TransformerFactory transFactory = TransformerFactory.newInstance();
            Transformer transformer = transFactory.newTransformer();
            transformer.transform(new StAXSource(reader), result);
            Document document = (Document) result.getNode();

            NodeList startlist = document.getChildNodes();

            processNodeList(startlist);

        } catch (Exception e) {
            System.err.println("Something went wrong, this might help :\n" + e.getMessage());
        }
    }

Now all nodes from the document are in a NodeList so do this next :

private void processNodeList(NodeList nodelist) {
        for (int i = 0; i < nodelist.getLength(); i++) {
            if (nodelist.item(i).getNodeType() == Node.ELEMENT_NODE && (hasValidAttributes(nodelist.item(i)) || hasValidText(nodelist.item(i)))) {
                getNodeNamesAndValues(nodelist.item(i));
            }
            processNodeList(nodelist.item(i).getChildNodes());
        }
    }

Then for each element node with valid text get name and value

public void getNodeNamesAndValues(Node n) {

        String nodeValue = null;
        String nodeName = null;

        if (hasValidText(n)) {
            while (n != null && isWhiteSpace(n.getTextContent()) == true && StringUtils.isWhitespace(n.getTextContent()) && n.getNodeType() != Node.ELEMENT_NODE) {
                n = n.getFirstChild();
            }

            nodeValue = StringUtils.strip(n.getTextContent());
            nodeName = n.getLocalName();

            System.out.println(nodeName + " " + nodeValue);

        }
    }

Bunch of useful methods to check nodes :

private static boolean hasValidAttributes(Node node) {
        return (node.getAttributes().getLength() > 0);

    }

private boolean hasValidText(Node node) {
        String textValue = node.getTextContent();

        return (textValue != null && textValue != "" && isWhiteSpace(textValue) == false && !StringUtils.isWhitespace(textValue) && node.hasChildNodes());
    }

private boolean isWhiteSpace(String nodeText) {
        if (nodeText.startsWith("\r") || nodeText.startsWith("\t") || nodeText.startsWith("\n") || nodeText.startsWith(" "))
            return true;
        else
            return false;
    }

I also used StringUtils, you can get that by including this in your pom.xml if you're using maven :

<dependency>
            <groupId>commons-lang</groupId>
            <artifactId>commons-lang</artifactId>
            <version>2.5</version>
        </dependency>

This is inefficient if you're reading huge files, but not so much if you split them first. This is what I've come with(with google). There are more better solutions this is mine, I'm an amateur(for now).