I'm trying to log a UTF-8 encoded string to a file using Python's logging package. As a toy example:
import logging
def logging_test():
handler = logging.FileHandler("/home/ted/logfile.txt", "w",
encoding = "UTF-8")
formatter = logging.Formatter("%(message)s")
handler.setFormatter(formatter)
root_logger = logging.getLogger()
root_logger.addHandler(handler)
root_logger.setLevel(logging.INFO)
# This is an o with a hat on it.
byte_string = '\xc3\xb4'
unicode_string = unicode("\xc3\xb4", "utf-8")
print "printed unicode object: %s" % unicode_string
# Explode
root_logger.info(unicode_string)
if __name__ == "__main__":
logging_test()
This explodes with UnicodeDecodeError on the logging.info() call.
At a lower level, Python's logging package is using the codecs package to open the log file, passing in the "UTF-8" argument as the encoding. That's all well and good, but it's trying to write byte strings to the file instead of unicode objects, which explodes. Essentially, Python is doing this:
file_handler.write(unicode_string.encode("UTF-8"))
When it should be doing this:
file_handler.write(unicode_string)
Is this a bug in Python, or am I taking crazy pills? FWIW, this is a stock Python 2.6 installation.
Having code like:
Caused:
This happens because the format string is a byte string, while some of the format string arguments are unicode strings with non-ASCII characters:
Making the format string unicode fixes the issue:
So, in your logging configuration make all format string unicode:
And patch the default
logging
formatter to use unicode format string:Check that you have the latest Python 2.6 - some Unicode bugs were found and fixed since 2.6 came out. For example, on my Ubuntu Jaunty system, I ran your script copied and pasted, removing only the '/home/ted/' prefix from the log file name. Result (copied and pasted from a terminal window):
On a Windows box:
And the contents of the file:
This might also explain why Lennart Regebro couldn't reproduce it either.
If I understood your problem correctly, the same issue should arise on your system when you do just:
I guess automatic encoding to the locale encoding on Unix will not work until you have enabled locale-aware
if
branch in thesetencoding
function in yoursite
module vialocale
. This file usually resides in/usr/lib/python2.x
, it worth inspecting anyway. AFAIK, locale-awaresetencoding
is disabled by default (it's true for my Python 2.6 installation).The choices are:
site.py
is needed)See also The Illusive setdefaultencoding by Ian Bicking and related links.
Try this:
For what it's worth I was expecting to have to use codecs.open to open the file with utf-8 encoding but either that's the default or something else is going on here, since it works as is like this.