I'm using the lxml.html
library to parse an HTML document.
I located a specific tag, that I call content_tag
, and I want to change its content (i.e. the text between <div>
and </div>
,) and the new content is a string with some html in it, say it's 'Hello <b>world!</b>'
.
How do I do that? I tried content_tag.text = 'Hello <b>world!</b>'
but then it escapes all the html tags, replacing <
with <
etc.
I want to inject the text without escaping any HTML. How can I do that?
Assuming content_tag doesn't have any subelement, you can just do:
After tinkering around, i found this solution:
This is one way:
See also: http://lxml.de/lxmlhtml.html#creating-html-with-the-e-factory
Edit: So, I should have confessed earlier that I'm not all that familiar with lxml. I looked at the docs and source briefly, but didn't find a clean solution. Perhaps, someone more familiar will stop by and set us both straight.
In the meantime, this seems to work, but is not well tested:
Edit again: and this version removes text and children