random text from /dev/random raising an error in l

2019-05-20 17:07发布

I am, for the sake of testing my web app, pasting some random characters from /dev/random into my web frontend. This line throws an error:

print repr(comment)
import html5lib
print html5lib.parse(comment, treebuilder="lxml")

'a\xef\xbf\xbd\xef\xbf\xbd\xc9\xb6E\xef\xbf\xbd\xef\xbf\xbd`\xef\xbf\xbd]\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd2 \x14\xef\xbf\xbd\xc7\xbe\xef\xbf\xbdy\xcb\x9c\xef\xbf\xbdi1O\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbdZ\xef\xbf\xbd.\xef\xbf\xbd\x17^C'

Unhandled Error
    Traceback (most recent call last):
      File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 893, in _inlineCallbacks
        result = g.send(result)
      File "/home/work/random/social/social/item.py", line 389, in _new
        convId, conv = yield plugin.create(request)
      File "/home/work/random/social/social/logging.py", line 47, in wrapper
        ret = func(*args, **kwargs)
      File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 1014, in unwindGenerator
        return _inlineCallbacks(None, f(*args, **kwargs), Deferred())
    --- <exception caught here> ---
      File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 893, in _inlineCallbacks
        result = g.send(result)
      File "/home/work/random/social/twisted/plugins/status.py", line 63, in create
        print html5lib.parse(comment, treebuilder="lxml")
      File "/usr/local/lib/python2.6/dist-packages/html5lib-0.90-py2.6.egg/html5lib/html5parser.py", line 38, in parse
        return p.parse(doc, encoding=encoding)
      File "/usr/local/lib/python2.6/dist-packages/html5lib-0.90-py2.6.egg/html5lib/html5parser.py", line 211, in parse
        parseMeta=parseMeta, useChardet=useChardet)
      File "/usr/local/lib/python2.6/dist-packages/html5lib-0.90-py2.6.egg/html5lib/html5parser.py", line 111, in _parse
        self.mainLoop()
      File "/usr/local/lib/python2.6/dist-packages/html5lib-0.90-py2.6.egg/html5lib/html5parser.py", line 174, in mainLoop
        self.phase.processCharacters(token)
      File "/usr/local/lib/python2.6/dist-packages/html5lib-0.90-py2.6.egg/html5lib/html5parser.py", line 572, in processCharacters
        self.parser.phase.processCharacters(token)
      File "/usr/local/lib/python2.6/dist-packages/html5lib-0.90-py2.6.egg/html5lib/html5parser.py", line 611, in processCharacters
        self.parser.phase.processCharacters(token)
      File "/usr/local/lib/python2.6/dist-packages/html5lib-0.90-py2.6.egg/html5lib/html5parser.py", line 652, in processCharacters
        self.parser.phase.processCharacters(token)
      File "/usr/local/lib/python2.6/dist-packages/html5lib-0.90-py2.6.egg/html5lib/html5parser.py", line 711, in processCharacters
        self.parser.phase.processCharacters(token)
      File "/usr/local/lib/python2.6/dist-packages/html5lib-0.90-py2.6.egg/html5lib/html5parser.py", line 804, in processCharacters
        self.parser.phase.processCharacters(token)
      File "/usr/local/lib/python2.6/dist-packages/html5lib-0.90-py2.6.egg/html5lib/html5parser.py", line 948, in processCharacters
        self.tree.insertText(token["data"])
      File "/usr/local/lib/python2.6/dist-packages/html5lib-0.90-py2.6.egg/html5lib/treebuilders/_base.py", line 288, in insertText
        parent.insertText(data)
      File "/usr/local/lib/python2.6/dist-packages/html5lib-0.90-py2.6.egg/html5lib/treebuilders/etree_lxml.py", line 225, in insertText
        builder.Element.insertText(self, data, insertBefore)
      File "/usr/local/lib/python2.6/dist-packages/html5lib-0.90-py2.6.egg/html5lib/treebuilders/etree.py", line 114, in insertText
        self._element.text += data
      File "lxml.etree.pyx", line 821, in lxml.etree._Element.text.__set__ (src/lxml/lxml.etree.c:33308)

      File "apihelpers.pxi", line 646, in lxml.etree._setNodeText (src/lxml/lxml.etree.c:15287)

      File "apihelpers.pxi", line 1295, in lxml.etree._utf8 (src/lxml/lxml.etree.c:20212)

    exceptions.ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes

Before I am committing a user entered string, I am doing this:

comment.decode('utf-8').encode('utf-8', "replace")

but this does not seem to be helping in this case.

-- Abhi

1条回答
来,给爷笑一个
2楼-- · 2019-05-20 17:41

The problem is that text in XML cannot include certain characters mainly control ones with byte value below 32 The XML 1.0 Recommendation defines a Char as

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

/dev/random can provide bytes that don't match this e.g. control characters and some multi byte characters.

So you have to filter out these bytes before trying any encoding.

查看更多
登录 后发表回答