Python unicode write to file crashes in command li

I'm having a problem wherein my Python 2.7.3rc2 code runs fine through an IDE (Aptana Studio 3 with PyDev), but crashes when I either double-click the .py file or try to run it from the Windows command line.

The problem line is where I try to write a string containing unicode characters to a file. The IDE has no problem with it, and writes the file properly with the unicode characters. The command line version complains that it can't encode certain characters.

The root of the question is: what's different about the IDE version versus the command line version that one writes a unicode file properly and the other does not?

The ideal solution should have the command line version working exactly as the IDE version does.

EDIT: Sorry, I thought it was assumed which command I was using to write a string to a file, but I'm new to Python. The actual command is write() called on an object f which was instantiated with f = open(path, 'w'). I pass it the string I want it to write to the file, and that string contains unicode characters.

The full error message is:

Traceback (most recent call last):
  File "writer.py", line 46, in <module>
    write_listings(c, output_path)
  File "writer.py", line 33, in write_listings
    print name
  File "c:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 21-26: character maps to <undefined>

Here is an example string: 滑鐵盧安大略加拿大

Unfortunately I'm having trouble creating an SSCCE because I can't just put that string literal into a source code file without it complaining that I haven't declared an encoding. It's frustrating -- this was all working so well when I ran everything from the IDE and now I'm headed down a unicode rabbit hole!

EDIT: Thanks to Fredrik, I'm now able to make an SSCCE. Here it is:

# -*- coding: utf-8 -*-
str = u'滑鐵盧安大略加拿大'
f = open('test', 'w')
f.write(str)
f.close()

This SSCCE crashes when run from command line but not from the IDE. Why is that?

EDIT: I added some additional code suggested by Edward Loper to verify that the version of Python is identical for the command line and IDE versions.

Here is the new code:

# -*- coding: utf-8 -*-
import sys
print sys.version
print open
print open.__module__

str = u'滑鐵盧安大略加拿大'
f = open('test', 'w')
f.write(str)
f.close()

Here is the output when run from the IDE:

2.7.3rc2 (default, Mar 18 2012, 22:59:27) [MSC v.1500 64 bit (AMD64)]
<built-in function open>
__builtin__

And here is the output when run from the command line:

2.7.3rc2 (default, Mar 18 2012, 22:59:27) [MSC v.1500 64 bit (AMD64)]
<built-in function open>
__builtin__
Traceback (most recent call last):
  File "test.py", line 9, in <module>
    f.write(str)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-8: ordinal not in range(128)

In my opinion, the question is still unanswered because I still have no idea what would make it work in the IDE and not the command line!

回答1:

As Fenikso said, you should encode a string before writing it to a file. The reason that file.write() doesn't do this itself is that you need to specify which encoding (utf-8, utf-16, etc) you want to use. There's a python module "codecs" which allows you to create stream objects that know what encoding to use, and automatically apply it. That's what Fenikso is using in his second example.

As to why your code works in the IDE but not the command line, my guess is that your IDE is setting the "default encoding" to some non-default value. Try running this in both the IDE and the command line and see if it differs:

>>> import sys
>>> print sys.getdefaultencoding()

Here's some related information: http://blog.ianbicking.org/illusive-setdefaultencoding.html

回答2:

You should explicitly encode your string in desired encoding before writing it in the file:

f.write(text.encode("cp1250", "replace")) # Czech Windows encoding, use your own

f.write(text.encode("utf-8", "replace")) # UTF-8

You can also explicitly open the file with specific encoding:

# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import codecs

x = "abcč"
f = codecs.open("test.txt", "w", "utf-8", "replace")
f.write(x)

回答3:

This is how I do whenever I need to work with a specific encoding

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
import codecs
out = codecs.getwriter('utf-8')(sys.stdout)
out.write('some åäö-string')