Print to UTF-8 encoded file, with platform-depende

In Python, what is the best way to write to a UTF-8 encoded file with platform-dependent newlines? the solution would ideally work quite transparently in a program that does a lot of printing in Python 2. (Information about Python 3 is welcome too!)

In fact, the standard way of writing to a UTF-8 file seems to be codecs.open('name.txt', 'w'). However, the documentation indicates that

(…) no automatic conversion of '\n' is done on reading and writing.

because the file is actually opened in binary mode. So, how to write to a UTF-8 file with proper platform-dependent newlines?

Note: The 't' mode seems to actually do the job (codecs.open('name.txt', 'wt')) with Python 2.6 on Windows XP, but is this documented and guaranteed to work?

标签： python text utf-8 newline codec

3条回答

一纸荒年 Trace。

2楼-- · 2019-01-26 07:13

Presuming Python 2.7.1 (that's the docs that you quoted): The 'wt' mode is not documented (the ONLY mode documented is 'r'), and does not work -- the codecs module appends 'b' to the mode, which causes it to fail:

>>> f = codecs.open('bar.txt', 'wt', encoding='utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\python27\lib\codecs.py", line 881, in open
    file = __builtin__.open(filename, mode, buffering)
ValueError: Invalid mode ('wtb')

Avoid the codecs module and DIY:

f = open('bar.text', 'w')
f.write(unicode_object.encode('utf8'))

Update about Python 3.x:

It appears the codecs.open() has the same deficiency (won't write platform-specific line terminator). However built-in open(), which has an encoding arg, is happy to do it:

[Python 3.2 on Windows 7 Pro]
>>> import codecs
>>> f = codecs.open('bar.txt', 'w', encoding='utf8')
>>> f.write('line1\nline2\n')
>>> f.close()
>>> open('bar.txt', 'rb').read()
b'line1\nline2\n'
>>> f = open('bar.txt', 'w', encoding='utf8')
>>> f.write('line1\nline2\n')
12
>>> f.close()
>>> open('bar.txt', 'rb').read()
b'line1\r\nline2\r\n'
>>>

Update about Python 2.6

The docs say the same as the 2.7 docs. The difference is that the "bludgeon into binary mode" hack of appending "b" to the mode arg failed in 2.6 because "wtb" wasn't detected as as an invalid mode, the file was opened in text mode, and appears to work as you wanted, not as documented:

>>> import codecs
>>> f = codecs.open('fubar.txt', 'wt', encoding='utf8')
>>> f.write(u'\u0a0aline1\n\xffline2\n')
>>> f.close()
>>> open('fubar.txt', 'rb').read()
'\xe0\xa8\x8aline1\r\n\xc3\xbfline2\r\n' # "works"
>>> f.mode
'wtb' # oops
>>>

0人赞添加讨论(0) 举报

▲ chillily

3楼-- · 2019-01-26 07:25

Are you looking for os.linesep? http://www.python.org/doc//current/library/os.html#os.linesep

0人赞添加讨论(0) 举报

在下西门庆

4楼-- · 2019-01-26 07:25

In Python 2, why not encode explicitly?

with open('myfile.txt', 'w') as f:
    print >> f, some_unicode_text.encode('UTF-8')

Both embedded newlines, and those emitted by print, will be converted to the appropriate platform newline.

0人赞添加讨论(0) 举报

Print to UTF-8 encoded file, with platform-depende

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间