How to insert JavaScript into [removed] element?

2019-07-03 16:38发布

问题:

What I have is:

from lxml import etree
myscript = "if(0 < 1){alert(\"Hello World!\");}"
html = etree.fromstring("<script></script>")

for element in html.findall('//script'):
    element.text = myscript

result = etree.tostring(html)

What I get is:

>>> result
'<script>if(0 &lt; 1){alert("Hello World!");}</script>'

What I want is unescaped JavaScript:

>>> result
'<script>if(0 < 1){alert("Hello World!");}</script>'

回答1:

The reason why your approach fails is because you're trying to change the "text" content of the element, whereas you need to change/insert/append the Element of its own, see this sample:

In [1]: from lxml import html

In [2]: myscript = "<script>if(0 < 1){alert(\"Hello World!\");}</script>"

In [3]: template = html.fromstring("<script></script>")

# just a quick hack to get the <script> element without <html> <head>
In [4]: script_element = html.fromstring(myscript).xpath("//script")[0]

# insert new element then remove the old one
In [10]: for element in template.xpath("//script"):
   ....:     element.getparent().insert(0, script_element)
   ....:     element.getparent().remove(element)
   ....:

In [11]: print html.tostring(template)
<html><head><script>if(0 < 1){alert("Hello World!");}</script></head></html>

So yes, you can still technically use lxml to insert element. And I suggest using lxml.html over etree as html is more friendly regarding to html elements.



回答2:

You can’t. lxml.etree and ElementTree are XML parsers, so whatever you want to parse or create with them has to be valid XML. And an unescaped < inside some node text is not valid XML. It’s valid HTML but not valid XML.

That’s why in XHTML, you usually had to add CDATA blocks inside <script> tags, so you could put whatever in there without having to worry about valid XML structure.

But in your case, you just want to produce HTML, and for that, you should use an HTML parser. For example BeautifulSoup:

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('<script></script>')
>>> soup.find('script').string = 'if(0 < 1){alert("Hello World!");}'
>>> str(soup)
'<script>if(0 < 1){alert("Hello World!");}</script>'