General Unicode/UTF-8 support for csv files in Pyt

The csv module in Python doesn't work properly when there's UTF-8/Unicode involved. I have found, in the Python documentation and on other webpages, snippets that work for specific cases but you have to understand well what encoding you are handling and use the appropriate snippet.

How can I read and write both strings and Unicode strings from .csv files that "just works" in Python 2.6? Or is this a limitation of Python 2.6 that has no simple solution?

标签： python csv unicode utf-8 python-2.x

10条回答

贼婆χ

2楼-- · 2019-01-03 14:43

The wrapper unicode_csv_reader mentioned in the python documentation accepts Unicode strings. This is because csv does not accept Unicode strings. cvs is probably not aware of encoding or locale and just treats the strings it gets as bytes. So what happens is that the wrapper encodes the Unicode strings, meaning that it creates a string of bytes. Then, when the wrapper gives back the results from csv, it decodes the bytes again, meaning that it converts the UTF-8 bytes sequences to the correct unicode characters.

If you give the wrapper a plain byte string e.g. by using f.readlines() it will give a UnicodeDecodeError on bytes with value > 127. You would use the wrapper in case you have unicode strings in your program that are in the CSV format.

I can imagine that the wrapper still has one limitation: since cvs does not accept unicode, and it also does not accept multi-byte delimiters, you can't parse files that have a unicode character as the delimiter.

0人赞添加讨论(0) 举报

ら.Afraid

3楼-- · 2019-01-03 14:48

A little late answer, but I have used unicodecsv with great success.

0人赞添加讨论(0) 举报

Explosion°爆炸

4楼-- · 2019-01-03 14:49

I confirm, unicodecsv is a great replacement for the csv module, I've just replaced csv by unicodecsv in my source code, and it works like a charm.

0人赞添加讨论(0) 举报

成全新的幸福

5楼-- · 2019-01-03 14:49

You should consider tablib, which has a completely different approach, but should be considered under the "just works" requirement.

with open('some.csv', 'rb') as f:
    csv = f.read().decode("utf-8")

import tablib
ds = tablib.Dataset()
ds.csv = csv
for row in ds.dict:
    print row["First name"]

Warning: tablib will reject your csv if it doesn't have the same number of items on every row.

0人赞添加讨论(0) 举报

时光不老，我们不散

6楼-- · 2019-01-03 14:49

Maybe this is blatantly obvious, but for sake of beginners I'll mention it.

In python 3.X csv module supports any encoding out of the box, so if you use this version you can stick to the standard module.

 with open("foo.csv", encoding="utf-8") as f: 
     r = csv.reader(f, delimiter=";")
     for row in r: 
     print(row)

For additional discussion please see: Does python 3.1.3 support unicode in csv module?

0人赞添加讨论(0) 举报

再贱就再见

7楼-- · 2019-01-03 14:49

There is the usage of Unicode example already in that doc, why still need to find another one or re-invent the wheel?

import csv

def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs):
    # csv.py doesn't do Unicode; encode temporarily as UTF-8:
    csv_reader = csv.reader(utf_8_encoder(unicode_csv_data),
                            dialect=dialect, **kwargs)
    for row in csv_reader:
        # decode UTF-8 back to Unicode, cell by cell:
        yield [unicode(cell, 'utf-8') for cell in row]

def utf_8_encoder(unicode_csv_data):
    for line in unicode_csv_data:
        yield line.encode('utf-8')

0人赞添加讨论(0) 举报

1 2 下一页

General Unicode/UTF-8 support for csv files in Pyt

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间