I am trying to process some data with lxml. It works fine on my development server, but on production the following code:
parser = etree.XMLParser(encoding='cp1251')
throws:
File "parser.pxi", line 1288, in lxml.etree.XMLParser.__init__ (third_party/apphosting/python/lxml/src/lxml/lxml.etree.c:77726)
File "parser.pxi", line 738, in lxml.etree._BaseParser.__init__ (third_party/apphosting/python/lxml/src/lxml/lxml.etree.c:73404)
LookupError: unknown encoding: 'cp1251'
I am using lxml 2.3. The same version seems to be supported by GAE. So why is this error?
Edit:
I specified different encodings to XMLParser
, such as cp1252, ISO-8859-5, ISO-8859-2 and it always throwed the same error on GAE, but works on my local machine. These are popular encodings and lxml on GAE must support them. I believe this is something wrong with lxml build on GAE.
I created an issue: http://code.google.com/p/googleappengine/issues/detail?id=7315
Edit2:
Full traceback:
unknown encoding: 'cp1251'
Traceback (most recent call last):
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1511, in __call__
rv = self.handle_exception(request, response, e)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__
rv = self.router.dispatch(request, response)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
return route.handler_adapter(request, response)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__
return handler.dispatch()
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 547, in dispatch
return self.handle_exception(e, self.app.debug)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/s~my_cool_app_id/1.358126884781269352/main.py", line 29, in get
parser = etree.XMLParser(encoding='cp1251')
File "parser.pxi", line 1288, in lxml.etree.XMLParser.__init__ (third_party/apphosting/python/lxml/src/lxml/lxml.etree.c:77726)
File "parser.pxi", line 738, in lxml.etree._BaseParser.__init__ (third_party/apphosting/python/lxml/src/lxml/lxml.etree.c:73404)
LookupError: unknown encoding: 'cp1251'
There seems to be a bug open about this behavior on OS X where specifying encoding="cp1252" resulted in the error above. The comments also specify other systems as affected: https://bugs.launchpad.net/lxml/+bug/707396
Have you tried specifying other encoding types? (to see if it's just a problem with cp1252)