The csv module in Python doesn't work properly when there's UTF-8/Unicode involved. I have found, in the Python documentation and on other webpages, snippets that work for specific cases but you have to understand well what encoding you are handling and use the appropriate snippet.
How can I read and write both strings and Unicode strings from .csv files that "just works" in Python 2.6? Or is this a limitation of Python 2.6 that has no simple solution?
Here is an slightly improved version of Maxim's answer, which can also skip the UTF-8 BOM:
Note that the presence of the BOM is not automatically detected. You must signal it is there by passing the
encoding='utf-8-sig'
argument to the constructor ofUnicodeCsvReader
orUnicodeDictReader
. Encodingutf-8-sig
isutf-8
with a BOM.The example code of how to read Unicode given at http://docs.python.org/library/csv.html#examples looks to be obsolete, as it doesn't work with Python 2.6 and 2.7.
Here follows
UnicodeDictReader
which works with utf-8 and may be with other encodings, but I only tested it on utf-8 inputs.The idea in short is to decode Unicode only after a csv row has been split into fields by
csv.reader
.Usage (source file encoding is utf-8):
Output:
The module provided here, looks like a cool, simple, drop-in replacement for the csv module that allows you to work with utf-8 csv.
I would add to itsadok's answer. By default, excel saves csv files as latin-1 (which ucsv does not support). You can easily fix this by: