Parse XML & Retrieve Info Several layers of Nodes

2020-05-02 08:52发布

问题:

I am working from an example provided by my professor which gets data from a weather forecast site and parses the XML file to show the weather conditions in a list. My program is similar, but I want to retrieve information that is nested within several nodes, and I don't know how to get to it. Here is the XML file I'm working from:

<?xml version="1.0" encoding="UTF-8"?> 
<DirectionsResponse> 
 <status>OK</status> 
 <route> 
  <summary>S Street Viaduct</summary> 
  <leg> 
   <step> 
    <travel_mode>DRIVING</travel_mode> 
    <start_location> 
     <lat>40.7021400</lat> 
     <lng>-74.0158200</lng> 
    </start_location> 
    <end_location> 
     <lat>40.7021400</lat> 
     <lng>-74.0158200</lng> 
    </end_location> 
    <polyline> 
     <points>kslwFzewbM</points> 
     <levels>B</levels> 
    </polyline> 
    <duration> 
     <value>0</value> 
     <text>1 min</text> 
    </duration> 
    <html_instructions>Head &lt;b&gt;east&lt;/b&gt; on &lt;b&gt;S Street Viaduct&lt;/b&gt;</html_instructions> 
    <distance> 
     <value>0</value> 
     <text>1 ft</text> 
    </distance> 
   </step> 
   <duration> 
    <value>0</value> 
    <text>1 min</text> 
   </duration> 
   <distance> 
    <value>0</value> 
    <text>1 ft</text> 
   </distance> 
   <start_location> 
    <lat>40.7021400</lat> 
    <lng>-74.0158200</lng> 
   </start_location> 
   <end_location> 
    <lat>40.7021400</lat> 
    <lng>-74.0158200</lng> 
   </end_location> 
   <start_address>S Street Viaduct, New York, NY 10004, USA</start_address> 
   <end_address>S Street Viaduct, New York, NY 10004, USA</end_address> 
  </leg> 
  <copyrights>Map data ©2010 Google, Sanborn</copyrights> 
  <overview_polyline> 
   <points>kslwFzewbM</points> 
   <levels>B</levels> 
  </overview_polyline> 
 </route> 
</DirectionsResponse> 

I'm really only interested in retrieving the info in the "html_instructions" tag, but it is nested in the "route", "leg", and "step" tags. I have seen several tutorials and questions on SO about parsing XML but couldn't seem to find a solution to this. Any direction would be greatly appreciated!

Thanks.

回答1:

So basically using the SAX parser is a good choice for you (it is fast, allows you to filter out all the unnecessary data, consumes low memory). When working with SAX for the first time, you may find the following example useful. I do not say the code is perfect (it misses e.g. exception handling, safe stream closing, etc.), but it could be a good start point for you.


import java.io.FileInputStream;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.List;

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class Test {

  private static final String HTML_INSTRUCTIONS = "html_instructions";

  public static void main(String[] args) throws Exception {
    final List htmlInstructions = new ArrayList();

    SAXParserFactory spf = SAXParserFactory.newInstance();
    SAXParser sp = spf.newSAXParser();
    DefaultHandler dh = new DefaultHandler() {
      private boolean isHtmlInstructions = false;
      private StringBuilder sb = new StringBuilder();
      @Override
      public void startElement(String uri, String localName, String name,
          Attributes attributes) throws SAXException {
        super.startElement(uri, localName, name, attributes);
        if (HTML_INSTRUCTIONS.equals(name)) {
          isHtmlInstructions = true;
        }
      }

      @Override
      public void characters(char ch[], int start, int length)
      throws SAXException {
        if (isHtmlInstructions) {
          sb.append(ch, start, length);
        }
      }

      @Override
      public void endElement(String uri, String localName, String name)
          throws SAXException {
        super.endElement(uri, localName, name);
        if (HTML_INSTRUCTIONS.equals(name)) {
          htmlInstructions.add(sb.toString());
          sb.delete(0, sb.length());
          isHtmlInstructions = false;
        }
      }
    };

    InputStream is = new FileInputStream("test.xml");
    sp.parse(is, dh);
    for (String htmlInstruction : htmlInstructions) {
      System.out.println(htmlInstruction);
    }

  }

}

The output should look like this:


Head <b>east on <b>S Street Viaduct</b>



回答2:

Use SAX and only pay attention to the html_instructions tag. Your handler will be called with startElement() for each element and is passed in the element's name. Compare that name to "html_instructions". When you have a match, pay attention to all processed nodes until the corresponding endElement() call.