Parsing an XML structure with an unknown amount of

2020-06-18 03:48发布

问题:

I have to parse a XML structure in JAVA using the SAX parser.

The problem is that the structure is recursive with an unspecified count of recursions. This still is not such a big deal, the big deal is that I can't take advantage of the XML namespace functionality and the tags are the same on every recursion level.

Here is an example of the structure.

<?xml version="1.0" encoding="UTF-8"?>
<RootTag>
    <!-- LOADS OF OTHER TAGS -->
    <Tags attribute="value">
        <Tag attribute="value">
            <SomeOtherTag></SomeOtherTag>
            <Tags attribute="value">
                <Tag attribute="value">
                    <SomeOtherTag></SomeOtherTag>
                    <Tags attribute="value">
                        <!-- MORE OF THE SAME STRUCTURE -->
                    </Tags>
                </Tag>
            </Tags>
        </Tag>
    </Tags>
    <!-- LOADS OF OTHER TAGS -->
</RootTag>

As you can see there is a recursion, better an undefined number of recursions. Now my problem is how to extract all data for every recursion and save it in a HashMap for example.

I could define a ContentHandler for the occurrence of Tags and have it extract the content in a HashMap and put it back in a master HashMap defined in the main content handler but I'm not sure hot to do this.

How do I extract and save the content of a recursive XML structure without using namespaces?

回答1:

Check out this set of Javaworld articles on using SAX. It demonstrates an easy way to parse a recursive XML structure using SAX. It creates a state machine showing for each element which elements it can contain. As your contentHandler traverses the xml it keeps a stack showing which element it's currently on.



回答2:

If you want to parse XML via SAX recursively, you must use Stack and check the depth in your XML structure. For my XML structure in this format (max. depth is 3):

<Response action='categories'>
    <Categories>
        <Category name='{name}' id='{id}' numSubcategories='{num}'>
            <Category name='{name}' id='{id}' numSubcategories='{num}'>
                <Category name='{name}' id='{id}' numSubcategories='0'/>
                ...
            </Category>
            ...
        </Category>
        ...
    </Categories>
</Response>

I used this Java pseudocode and it works pretty well in my Android app (for known depth). If you don't know the amount of recursions and don't know the depth, you can just edit my code and in place of 3 ArrayList objects (and 3 Category objects), you can use one dynamic collection (for example ArrayList<ArrayList<Category>>) and put ArrayList<Category> into the ArrayList<ArrayList<Category>> using index, which presents getDepth() method.

public class CategoriesResponse extends Response
{
    private Stack<String> mTagStack = new Stack<String>();
    private ArrayList<Category> mCategories1;
    private ArrayList<Category> mCategories2;
    private ArrayList<Category> mCategories3;
    Category mCategory1;
    Category mCategory2;
    Category mCategory3;
    private int mCurrentDepth = 0;


    public ArrayList<Category> getCategories()
    {
        return mCategories1;
    }


    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException
    {
        super.startElement(uri, localName, qName, attributes);

        ...

        if(localName.equals("Category"))
        {
            // push element into the stack
            mTagStack.push(localName);

            // get data
            int id = Integer.parseInt(attributes.getValue("id"));
            String name = attributes.getValue("name");
            int numSubcategories = Integer.parseInt(attributes.getValue("numSubcategories"));

            // create new Category
            if(getDepth()==1) 
            {
                mCategory1 = new Category(id, name);
                mCategory1.setSubcategoriesSize(numSubcategories);
                mCategory1.setSubcategories(null);
                if(mCurrentDepth<getDepth()) mCategories1 = new ArrayList<Category>(); // deeping down so create new list
            }
            else if(getDepth()==2) 
            {
                mCategory2 = new Category(id, name);
                mCategory2.setSubcategoriesSize(numSubcategories);
                mCategory2.setSubcategories(null);
                if(mCurrentDepth<getDepth()) mCategories2 = new ArrayList<Category>(); // deeping down so create new list
            }
            else if(getDepth()==3) 
            {
                mCategory3 = new Category(id, name);
                mCategory3.setSubcategoriesSize(numSubcategories);
                mCategory3.setSubcategories(null);
                if(mCurrentDepth<getDepth()) mCategories3 = new ArrayList<Category>(); // deeping down so create new list
            }

            // debug output
            if(mCurrentDepth<getDepth()) Log.d("SAX_TEST", getPath() + " | " + getDepth() + " | DEEPING DOWN");
            else if(mCurrentDepth>getDepth()) Log.d("SAX_TEST", getPath() + " | " + getDepth() + " | DEEPING UP");
            else if(mCurrentDepth==getDepth()) Log.d("SAX_TEST", getPath() + " | " + getDepth() + " | STAYING");

            // set current depth
            mCurrentDepth = getDepth();
            return;
        }
    }


    public void characters(char[] ch, int start, int length) throws SAXException
    {
        super.characters(ch, start, length);
        ...
    }


    public void endElement(String uri, String localName, String qName) throws SAXException
    {
        super.endElement(uri, localName, qName);

        ...

        if(localName.equals("Category"))
        {
            // debug output
            Log.d("SAX_TEST", "END OF THE ELEMENT IN DEPTH " + getDepth() + " | " + mCurrentDepth);

            // deeping up so set sublist for current category
            if(getDepth()!=mCurrentDepth)
            {
                if(getDepth()==2) mCategory2.setSubcategories(mCategories3);
                if(getDepth()==1) mCategory1.setSubcategories(mCategories2);
            }

            // add current category to list
            if(getDepth()==1) 
            {
                mCategories1.add(mCategory1);
            }
            else if(getDepth()==2) 
            {
                mCategories2.add(mCategory2);
            }
            else if(getDepth()==3)
            {
                mCategories3.add(mCategory3);
            }

            // pop element from stack
            mTagStack.pop();
            return;
        }
    }


    // debug output - write current path
    private String getPath()
    {
        String buffer = "";
        Enumeration<String> e = mTagStack.elements();
        while (e.hasMoreElements())
        {
            buffer = buffer + "/" + (String) e.nextElement();
        }
        return buffer;
    }


    // get current depth of stack
    private int getDepth()
    {
        return mTagStack.size();
    }
}