Parse XML file to get all Namespace information

2019-04-09 18:07发布

I want to be able to get all namespace information from a given XML File.

So for example, if the input XML File is something like:

<ns1:create xmlns:ns1="http://predic8.com/wsdl/material/ArticleService/1/">
   <ns1:article xmlns:ns1="xmlns:ns1='http://predic8.com/material/1/">
      <ns1:id>1</ns1:id>
      <description>bar</description>
      <name>foo</name>
      <ns1:price>
         <amount>00.00</amount>
         <currency>USD</currency>
      </ns1:price>
      <ns1:price>
         <amount>11.11</amount>
         <currency>AUD</currency>
      </ns1:price>
   </ns1:article>
   <ns1:article xmlns:ns1="xmlns:ns1='http://predic8.com/material/1/">
      <ns1:id>2</ns1:id>
      <description>some name</description>
      <name>some description</name>
      <ns1:price>
         <amount>00.01</amount>
         <currency>USD</currency>
      </ns1:price>
   </ns1:article>
</ns1:create>

I would like to expect an output that looks something like this (in this case comma-separated):

create, ns1, http://predic8.com/wsdl/material/ArticleService/1/
article, ns1, http://predic8.com/material/1/
price, ns1, http://predic8.com/material/1/
id, ns1, http://predic8.com/material/1/

Important notes:

It is important that we also consider sub-nodes which are defined within a specific namespace, but whose definition may be defined at a higher node. For example, we would still like to pick up the node ns1:id, where we need to trace back to the parent node ns1:article to discover that the namespace url is xmlns:ns1='http://predic8.com/material/1/

I am implementing in Java, so I would not mind either a Java-based solution, or even a XSLT-based solution might seem appropriate.

3条回答
爷的心禁止访问
2楼-- · 2019-04-09 18:16

This can be done with a single XPath 2.0 expression:

distinct-values(//*[name()!=local-name()]/
   concat(local-name(), ', ', substring-before(name(), ':'), ', ', namespace-uri())
查看更多
我欲成王,谁敢阻挡
3楼-- · 2019-04-09 18:29

Further developed the XPath expression proposed by Michael Kay (seems actually as a simplification) to also process unprefixed element names that belong to a default namespace:

distinct-values(//*[namespace-uri()]
                    /concat(local-name(),
                            ', ',
                            substring-before(name(), ':'),
                            ', ',
                            namespace-uri(),
                            '&#xA;'
                            )
                )

When this XPath expression is evaluated on the following document (the provided one but with an added element that is in a default namespace):

<ns1:create xmlns:ns1="http://predic8.com/wsdl/material/ArticleService/1/">
    <ns1:article xmlns:ns1="xmlns:ns1='http://predic8.com/material/1/">
        <ns1:id>1</ns1:id>
        <description>bar</description>
        <name>foo</name>
        <ns1:price>
            <amount>00.00</amount>
            <currency>USD</currency>
        </ns1:price>
        <ns1:price>
            <amount>11.11</amount>
            <currency>AUD</currency>
        </ns1:price>
    </ns1:article>
    <ns1:article xmlns:ns1="xmlns:ns1='http://predic8.com/material/1/">
        <ns1:id>2</ns1:id>
        <description>some name</description>
        <name>some description</name>
        <ns1:price>
            <amount>00.01</amount>
            <currency>USD</currency>
        </ns1:price>
        <quality xmlns="my:q">high</quality>
    </ns1:article>
</ns1:create>

the wanted, correct result is produced:

 create, ns1, http://predic8.com/wsdl/material/ArticleService/1/
 article, ns1, xmlns:ns1='http://predic8.com/material/1/
 id, ns1, xmlns:ns1='http://predic8.com/material/1/
 price, ns1, xmlns:ns1='http://predic8.com/material/1/
 quality, , my:q

A further, slight improvement is also to produce the namespace data for attribute names:

distinct-values(//(*|@*)[namespace-uri()]
                    /concat(if(. intersect ../@*)
                              then '@'
                              else (),
                            local-name(),
                            ', ',
                            substring-before(name(), ':'),
                            ', ',
                            namespace-uri(),
                            '&#xA;'
                            )
                )

When this XPath expression is evaluated on the following XML document (the previous one (above) with added an xml:lang attribute on one of the article elements):

<ns1:create xmlns:ns1="http://predic8.com/wsdl/material/ArticleService/1/">
    <ns1:article xml:lang="en-us" xmlns:ns1="xmlns:ns1='http://predic8.com/material/1/">
        <ns1:id>1</ns1:id>
        <description>bar</description>
        <name>foo</name>
        <ns1:price>
            <amount>00.00</amount>
            <currency>USD</currency>
        </ns1:price>
        <ns1:price>
            <amount>11.11</amount>
            <currency>AUD</currency>
        </ns1:price>
    </ns1:article>
    <ns1:article xmlns:ns1="xmlns:ns1='http://predic8.com/material/1/">
        <ns1:id>2</ns1:id>
        <description>some name</description>
        <name>some description</name>
        <ns1:price>
            <amount>00.01</amount>
            <currency>USD</currency>
        </ns1:price>
        <quality xmlns="my:q">high</quality>
    </ns1:article>
</ns1:create>

again the correct result is produced:

 create, ns1, http://predic8.com/wsdl/material/ArticleService/1/
 article, ns1, xmlns:ns1='http://predic8.com/material/1/
 @lang, xml, http://www.w3.org/XML/1998/namespace
 id, ns1, xmlns:ns1='http://predic8.com/material/1/
 price, ns1, xmlns:ns1='http://predic8.com/material/1/
 quality, , my:q
查看更多
时光不老,我们不散
4楼-- · 2019-04-09 18:36

I would use the built-in XMLStreamReader, which is the interface implemented by the streaming XML parser (get to it from the XMLInputFactory class). Its getName method returns a QName, which should give you everything you need.

Something along the lines of:

File file = new File("samples/sample11.xml");
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
XMLStreamReader reader = inputFactory.createXMLStreamReader(new FileInputStream(file));
Set<String> namespaces = new HashSet<String>();
while (reader.hasNext()) {
      int evt = reader.next();
      if (evt == XMLStreamConstants.START_ELEMENT) {
        QName qName = reader.getName();
        if(qName != null){
            if(qName.getPrefix() != null && qName.getPrefix().compareTo("")!=0)
                namespaces.add(String.format("%s, %s, %s",
                    qName.getLocalPart(), qName.getPrefix(), qName.getNamespaceURI()));
        }
      }
}

for(String namespace : namespaces){
    System.out.println(namespace);              
}
查看更多
登录 后发表回答