python lxml how i use tag in items name?

2020-04-23 03:35发布

问题:

i need to build xml file using special name of items, this is my current code :

from lxml import etree
import lxml
from lxml.builder import E

wp = E.wp

tmp = wp("title")

print(etree.tostring(tmp))

current output is this :

b'<wp>title</wp>'

i want to be :

b'<wp:title>title</title:wp>'

how i can create items with name like this : wp:title ?

回答1:

You confused the namespace prefix wp with the tag name. The namespace prefix is a document-local name for a namespace URI. wp:title requires a parser to look for a xmlns:wp="..." attribute to find the namespace itself (usually a URL but any globally unique string would do), either on the tag itself or on a parent tag. This connects tags to a unique value without making tag names too verbose to type out or read.

You need to provide the namepace, and optionally, the namespace mapping (mapping short names to full namespace names) to the element maker object. The default E object provided doesn't have a namespace or namespace map set. I'm going to assume that here that wp is the http://wordpress.org/export/1.2/ Wordpress namespace, as that seems the most likely, although it could also be that you are trying to send Windows Phone notifications.

Instead of using the default E element maker, create your own ElementMaker instance and pass it a namespace argument to tell lxml what URL the element belongs to. To get the right prefix on your element names, you also need to give it a nsmap dictionary that maps prefixes to URLs:

from lxml.builder import ElementMaker

namespaces = {"wp": "http://wordpress.org/export/1.2/"}
E = ElementMaker(namespace=namespaces["wp"], nsmap=namespaces)

title = E.title("Value of the wp:title tag")

This produces a tag with both the correct prefix, and the xmlns:wp attribute:

>>> from lxml.builder import ElementMaker
>>> namespaces = {"wp": "http://wordpress.org/export/1.2/"}
>>> E = ElementMaker(namespace=namespaces["wp"], nsmap=namespaces)
>>> title = E.title("Value of the wp:title tag")
>>> etree.tostring(title, encoding="unicode")
'<wp:title xmlns:wp="http://wordpress.org/export/1.2/">Value of the wp:title tag</wp:title>'

You can omit the nsmap value, but then you'd want to have such a map on a parent element of the document. In that case, you probably want to make separate ElementMaker objects for each namespace you need to support, and you put the nsmap namespace mapping on the outer-most element. When writing out the document, lxml then uses the short names throughout.

For example, creating a Wordpress WXR format document would require a number of namespaces:

from lxml.builder import ElementMaker

namespaces = {
    "excerpt": "https://wordpress.org/export/1.2/excerpt/",
    "content": "http://purl.org/rss/1.0/modules/content/",
    "wfw": "http://wellformedweb.org/CommentAPI/",
    "dc": "http://purl.org/dc/elements/1.1/",
    "wp": "https://wordpress.org/export/1.2/",
}

RootElement = ElementMaker(nsmap=namespaces)
ExcerptElement = ElementMaker(namespace=namespaces["excerpt"])
ContentElement = ElementMaker(namespace=namespaces["content"])
CommentElement = ElementMaker(namespace=namespaces["wfw"])
DublinCoreElement = ElementMaker(namespace=namespaces["dc"])
ExportElement = ElementMaker(namespace=namespaces["wp"])

and then you'd construct a document with

doc = RootElement.rss(
    RootElement.channel(
        ExportElement.wxr_version("1.2"),
        # etc. ...
    ),
    version="2.0"
)

which, when pretty printed with etree.tostring(doc, pretty_print=True, encoding="unicode"), produces:

<rss xmlns:excerpt="https://wordpress.org/export/1.2/excerpt/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:wp="https://wordpress.org/export/1.2/" version="2.0">
  <channel>
    <wp:wxr_version>1.2</wp:wxr_version>
  </channel>
</rss>

Note how only the root <rss> element has xmlns attributes, and how the <wp:wxr_version> tag uses the right prefix even though we only gave it the namespace URI.

To give a different example, if you are building a Windows Phone tile notification, it'd be simpler. After all, there is just a single namespace to use:

from lxml.builder import ElementMaker

namespaces = {"wp": "WPNotification"}
E = ElementMaker(namespace=namespaces["wp"], nsmap=namespaces)

notification = E.Notification(
    E.Tile(
        E.BackgroundImage("https://example.com/someimage.png"),
        E.Count("42"),
        E.Title("The notification title"),
        # ...
    )
)

which produces

<wp:Notification xmlns:wp="WPNotification">
  <wp:Tile>
    <wp:BackgroundImage>https://example.com/someimage.png</wp:BackgroundImage>
    <wp:Count>42</wp:Count>
    <wp:Title>The notification title</wp:Title>
  </wp:Tile>
</wp:Notification>

Only the outer-most element, <wp:Notification>, now has the xmlns:wp attribute. All other elements only need to include the wp: prefix.

Note that the prefix used is entirely up to you and even optional. It is the namespace URI that is the real key to uniquely identifying elements across different XML documents. If you used E = ElementMaker(namespace="WPNotification", nsmap={None: "WPNotification"}) instead, and so produced a top-level element with <Notification xmlns="WPNotification"> you still have a perfectly legal XML document that, according to the XML standard, has the exact same meaning.