Not able to retrieve XML tag nested within content

2019-09-14 02:09发布

问题:

Thanks for reading!

Using XML parsing tutorial from here as a reference, I am trying to parse a simple XML RSS feed with the following structure.

Everything works fine and all values are parsed except for the following case: I am not able to get the content of the <img> tag.


<feed>
    <title>This is Title</title>
    <count>10</count>
    <desc>
        This is a description for a sample feed <img src="http://someimagelink.com/img.jpg" />
    </desc>
    <link>This is link</link>
</feed>

This is what the endElement() method looks like:


        @Override
        public void endElement(String uri, String localName, String qName)
            throws SAXException {
        if(localName.equals("feed")) {
            //Add Records object to ArrayList
            //Feed is a POJO class to store all the feed content. 
            //FeedList is an ArrayList to store multiple Feed objects.
            mFeedList.add(mFeed); 
        }
        else if(localName.equals("title")) {
            mFeed.setTitle(currentValue.toString());
        }
        else if(localName.equals("count")) {
            mFeed.setCount(currentValue.toString());
        }
        else if(localName.equals("desc")) {
            mFeed.setDesc(currentValue.toString());
        }
        else if(localName.equals("img")) {
             //NEVER hits here :(
            mFeed.setImageUrl(currentValue.toString());
        }
        else if(localName.equals("link")) {
            //BUT, hits here
            mFeed.setLink(currentValue.toString());
        }

Since <img> tag is part of <desc> tag, the code in last else if condition never gets executed.

Note: When I read the the <desc> tag, I could do a manual String search to retrieve the <img> tag content. But, I am sure there has to be a more efficient way.

Can someone guide me on to get content of the <img> tag?

Thanks!

EDIT: Updated the <img> tag. It is now closed correctly.

EDIT2: Updating with startElement() code here. Also updated Feed XML and startElement() code.

@Override
public void startElement(String uri, String localName, String qName,
        Attributes attributes) throws SAXException {

    if(localName.equals("feed")) {
        //Instantiate Feed object
        mFeed = new Feed();
    }
    else if(localName.equals("title")) {
            currentValue = new StringBuffer("");
            isBuffering = true;
    }
    else if(localName.equals("count")) {
            currentValue = new StringBuffer("");
            isBuffering = true;     
    }
    else if(localName.equals("desc")) {
        currentValue = new StringBuffer("");
        isBuffering = true;
    }
    else if(localName.equals("img")) {
            currentValue = new StringBuffer("");
            isBuffering = true;
        }
    }
    else if(localName.equals("link")) {
        currentValue = new StringBuffer("");
        isBuffering = true;
    }       
}

回答1:

The <img> tag actually has no character content, and the value you're after has to be pulled out of the attributes.

To do this, you need to override startElement(String namespaceURI, String localName, String qName, Attributes atts), recognize the <img> tag more or less as you're doing, and get the value you need out of the atts parameter.

Debugging help:

Using this (simple/stupid) handler:

package com.donroby.so;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class DebugHandler extends DefaultHandler {

    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes)  throws SAXException {
        printParseInfo("startElement:", uri, localName, qName);
        int attributesLength = attributes.getLength();
        for (int i = 0; i < attributesLength; i++) {
            printAttributeInfo(attributes, i);
        }
    }

    @Override
    public void endElement(String uri, String localName, String qName)  throws SAXException {
        printParseInfo("endElement:  ", uri, localName, qName);
    }

    @Override
    public void characters(char[] chars, int start, int length) throws SAXException {
        String str = "";
        for (int i = start; i < start + length; i++)
          str += chars[i];

        System.out.println("Characters: '" + str + "'");
    }

    private void printAttributeInfo(Attributes attributes, int i) {
        System.out.println(String.format("%s URI: '%s', localName: '%s', qName: '%s', Value: '%s'", "Attribute ",
                attributes.getURI(i), attributes.getLocalName(i), attributes.getQName(i), attributes.getValue(i)));
    }

    private void printParseInfo(String type, String uri, String localName, String qName) {
        System.out.println(String.format("%s URI: '%s', localName: '%s', qName: '%s'", type, uri, localName, qName));
    }

}
startElement: URI: '', localName: '', qName: 'feed'
Characters: '
    '
startElement: URI: '', localName: '', qName: 'title'
Characters: 'This is Title'
endElement:   URI: '', localName: '', qName: 'title'
Characters: '
    '
startElement: URI: '', localName: '', qName: 'count'
Characters: '10'
endElement:   URI: '', localName: '', qName: 'count'
Characters: '
    '
startElement: URI: '', localName: '', qName: 'desc'
Characters: '
        This is a description for a sample feed '
startElement: URI: '', localName: '', qName: 'img'
Attribute  URI: '', localName: 'src', qName: 'src', Value: 'http://someimagelink.com/img.jpg'
endElement:   URI: '', localName: '', qName: 'img'
Characters: '
    '
endElement:   URI: '', localName: '', qName: 'desc'
Characters: '
    '
startElement: URI: '', localName: '', qName: 'link'
Characters: 'This is link'
endElement:   URI: '', localName: '', qName: 'link'
Characters: '
'
endElement:   URI: '', localName: '', qName: 'feed'

This indicates that the<img> tag does indeed generate start and end events.