python lxml tree, line[] creating multiple lines,

2019-03-06 13:01发布

问题:

I'm am creating an xml file with python using lxml. I am parsing through a file by line, looking for a string, and if that string exists, I create a SubElement. I am assigning the the SubElement a value which exists in the parsed file after the string I'm searching for.

Question: how do I get all the xml output onto one line in the output.xml file? Using line[12:] appears to be the problem. See below details.

Example file content per line:

[testclass] unique_value_horse
[testclass] unique_value_cat
[testclass] unique_value_bird

Python code:

When I hardcode a string such as below, the output xml is one continuous line for the xml tree. Perfect! See below.

with open(file) as openfile:
    for line in openfile:
        if "[testclass]" in line:
            tagxyz = etree.SubElement(subroot, "tagxyz")
            tagxyz.text = "hardcodevalue"

When I try and assign the 13th character onward as the value, I get a new line in the output xml per SubElement. This is causing errors for the receiver of the output xml file. See below.

with open(file) as openfile:
    for line in openfile:
        if "[testclass]" in line:
            tagxyz = etree.SubElement(subroot, "tagxyz")
            tagxyz.text = line[12:]

I thought making the assignment on the same line might help, but it does not seem to matter. See below.

with open(file) as openfile:
    for line in openfile:
        if "[testclass]" in line:
            etree.SubElement(subroot, "tagxyz").text = line[12:]

I have tried to employ etree.XMLParser(remove_blank_text=True), and parse the output xml file AFTER the fact and recreate the file, but that doesn't seem to help. I understand this should help, but either I'm using it wrong, or it won't actually solve my problem. See below.

with open("output.xml", 'w') as f:
    f.write(etree.tostring(project))

parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse("output.xml", parser)

with open("output2.xml", 'w') as fl:
    fl.write(etree.tostring(tree))

回答1:

Your lines include the line separator, \n. You can strip the line with str.rstrip():

with open(file) as openfile:
    for line in openfile:
        if "[testclass]" in line:
            etree.SubElement(subroot, "tagxyz").text = line.rstrip('\n')

In future, use the repr() function to debug such issues; you'll readily see the newline represented by its Python escape sequence:

>>> line = '[testclass] unique_value_horse\n'
>>> print(line)
[testclass] unique_value_horse

>>> print(repr(line))
'[testclass] unique_value_horse\n'
>>> print(repr(line.rstrip('\n')))
'[testclass] unique_value_horse'