Python XPath lxml could not read SVG path element

2019-07-23 16:28发布

问题:

I have an SVG (Xml) file from which I want to select some elements. For the sake of a MCRE I have cut down the file to this

<svg >
    <!-- xmlns:svg="http://www.w3.org/2000/svg" xmlns="http://www.w3.org/2000/svg" -->
    <g>
       <path style="fill:#19518b;fill-opacity:1;fill-rule:nonzero;stroke:none" />
       <path style="fill:#a80c3d;fill-opacity:1;fill-rule:nonzero;stroke:none" />
       <path style="fill:#a98b6e;fill-opacity:1;fill-rule:nonzero;stroke:none" />
   </g>
</svg>

Where some optional namespace attributes for the root element are located in a comment so they can be inserted back in to replicate the real scenario (where SVG root element is fulsomely attributed).

From the following Xml (SVG) I want to select the elements styled with fill:#19518b;fill-opacity:1;fill-rule:nonzero;stroke:none. There is one of match. The following code works on the given Xml.

from lxml import etree
sFileName = 'C:/Users/Simon/Downloads/pdf_skunkworks/inflation-report-may-2018-page6 - Copy.svg'

tree = etree.Parse(sFileName)
svgNamespace = "xmlns:svg='http://www.w3.org/2000/svg'"
#xpath = r"//svg:path[@style='fill:#19518b;fill-opacity:1;fill-rule:nonzero;stroke:none']"
xpath = r"//path[@style='fill:#19518b;fill-opacity:1;fill-rule:nonzero;stroke:none']"
Print (XPath)
#bluePaths = tree.xpath(xpath,namespaces={   'svg': svgNamespace  })
bluePaths = tree.XPath(XPath)

print (bluePaths[0])

but it works on the given Xml because it does not have the namespace attributes that one finds in a real SVG file. Once I reinsert the namespace attributes thus

<svg xmlns:svg="http://www.w3.org/2000/svg" xmlns="http://www.w3.org/2000/svg"    >

then the Python code (as given fails). I know I need to use namespaces and you can see my attempts commented out in the Python but they don't work. One of the namespace prefix is an empty string and could not be passed to the namespace dictionary.

Anyway, in the morning I will write could to clone the SVG file and remove the attributes from the root element because I know this approach works. In the meantime if someone can figure the real way to solve this then I'd be grateful (cloning files seems suboptimal).

P.S. The SVG is created from running Inkscape from the command line, I give a single page pdf and ask for plain svg export.

回答1:

It's because the namespace URI is just http://www.w3.org/2000/svg.

Change:

svgNamespace = "xmlns:svg='http://www.w3.org/2000/svg'"

to:

svgNamespace = "http://www.w3.org/2000/svg"