How to insert namespace and prefixes into an XML s

2019-01-20 12:41发布

问题:

Suppose I have an XML string:

<A>
    <B foo="123">
        <C>thing</C>
        <D>stuff</D>
    </B>
</A>

and I want to insert a namespace of the type used by XML Schema, putting a prefix in front of all the element names.

<A xmlns:ns1="www.example.com">
    <ns1:B foo="123">
        <ns1:C>thing</ns1:C>
        <ns1:D>stuff</ns1:D>
    </ns1:B>
</A>

Is there a way to do this (aside from brute-force find-replace or regex) using lxml.etree or a similar library?

回答1:

I don't think this can be done with just ElementTree.

Manipulating namespaces is sometimes surprisingly tricky. There are many questions about it here on SO. Even with the more advanced lxml library, it can be really hard. See these related questions:

  • lxml: add namespace to input file
  • Modify namespaces in a given xml document with lxml
  • lxml etree xmlparser remove unwanted namespace

Below is a solution that uses XSLT.

Code:

from lxml import etree

XML = '''
<A>
    <B foo="123">
        <C>thing</C>
        <D>stuff</D>
    </B>
</A>'''

XSLT = '''
<xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xmlns:ns1="www.example.com">
 <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>

  <xsl:template match="*">
   <xsl:element name="ns1:{name()}">
    <xsl:apply-templates select="node()|@*"/>
   </xsl:element>
  </xsl:template>

  <!-- No prefix on the A element -->
  <xsl:template match="A">
   <A xmlns:ns1="www.example.com">
    <xsl:apply-templates select="node()|@*"/>
   </A>
  </xsl:template>
</xsl:stylesheet>'''

xml_doc = etree.fromstring(XML)
xslt_doc = etree.fromstring(XSLT)
transform = etree.XSLT(xslt_doc)
print transform(xml_doc)

Output:

<A xmlns:ns1="www.example.com">
    <ns1:B foo="123">
        <ns1:C>thing</ns1:C>
        <ns1:D>stuff</ns1:D>
    </ns1:B>
</A>


回答2:

Use ET.register_namespace('ns1', 'www.example.com') to register the namespace with ElementTree. This is needed so write() uses the registered prefix. (I have code that uses a prefix of '' (an empty string) for the default namespace)

Then prefix each element name with {www.example.com}. For example: root.find('{www.example.com}B').