I'm trying to parse .svg files from http://kanjivg.tagaini.net/ , but I can't successfully extract the information inside.
Edit 1:(full file) http://www.filedropper.com/0f9ab
A part of 0f9ab.svg
looks like this:
<svg xmlns="http://www.w3.org/2000/svg" width="109" height="109" viewBox="0 0 109 109">
<g id="kvg:StrokePaths_0f9ab" style="fill:none;stroke:#000000;stroke-width:3;stroke-linecap:round;stroke-linejoin:round;">
<g id="kvg:0f9ab" kvg:element="嶺">
<g id="kvg:0f9ab-g1" kvg:element="山" kvg:position="top" kvg:radical="general">
<path id="kvg:0f9ab-s1" kvg:type="㇑a" d="M53.26,9.38c0.99,0.99,1.12,2.09,1.12,3.12c0,0.67,0.06,8.38,0.06,13.01"/>
<path id="kvg:0f9ab-s2" kvg:type="㇄a"
</g>
</g>
</g>
My .py file:
import lxml.etree as ET
svg = ET.parse('0f9ab.svg')
print(svg) # <lxml.etree._ElementTree object at 0x7f3a2f659ec8>
# AttributeError: 'lxml.etree._ElementTree' object has no attribute 'tag'
print(svg.tag)
# TypeError: 'lxml.etree._ElementTree' object is not subscriptable
print(svg[0])
# TypeError: 'lxml.etree._ElementTree' object is not iterable
for child in svg:
print(child)
# None
print(svg.find("./svg"))
# []
print(svg.findall("//g"))
# []
print(svg.xpath("//g"))
Purpose
I tried all kinds of operations I could think of, but nothing gets me any data from the .svg file.
I want to extract the kanji (Japanese character) in kvg:element="kanji"
(which are at different depth levels).
Question
- Is using
lxml
the wrong package for this? - If not, how do I extract information from my parsed .svg file?
Other solution
- I could of course I could just read the file as a string and search
for
kvg:element="
, but I would like to proper way of extracting xml / svg. - I used
xmltodict
before, but my code became really messy extractingkvg:element
, because they were at different depth levels.