I am new to Python, and I have a question about how to use Python to read and write CSV files. My file contains like Germany, French, etc. According to my code, the files can be read correctly in Python, but when I write it into a new CSV file, the unicode becomes some strange characters.
The data is like:
And my code is:
import csv
f=open('xxx.csv','rb')
reader=csv.reader(f)
wt=open('lll.csv','wb')
writer=csv.writer(wt,quoting=csv.QUOTE_ALL)
wt.close()
f.close()
And the result is like:
Would you please tell me what I should do to solve the problem? Thank you very much!
Another alternative:
Use the code from the unicodecsv package ...
https://pypi.python.org/pypi/unicodecsv/
This module is API compatible with the STDLIB csv module.
There is an example at the end of the csv module documentation that demonstrates how to deal with Unicode. Below is copied directly from that example. Note that the strings read or written will be Unicode strings. Don't pass a byte string to
UnicodeWriter.writerows
, for example.Input (UTF-8 encoded):
Output:
Make sure you encode and decode as appropriate.
This example will roundtrip some example text in utf-8 to a csv file and back out to demonstrate:
Prints:
Because
str
in python2 isbytes
actually. So if want to writeunicode
to csv, you must encodeunicode
tostr
usingutf-8
encoding.Use
class csv.DictWriter(csvfile, fieldnames, restval='', extrasaction='raise', dialect='excel', *args, **kwds)
:csvfile
:open(fp, 'w')
bytes
which are encoded withutf-8
writer.writerow({py2_unicode_to_str(k): py2_unicode_to_str(v) for k,v in row.items()})
csvfile
:open(fp, 'w')
str
asrow
towriter.writerow(row)
Finally code
Conclusion
In python3, just use the unicode
str
.In python2, use
unicode
handle text, usestr
when I/O occurs.I had the very same issue. The answer is that you are doing it right already. It is the problem of MS Excel. Try opening the file with another editor and you will notice that your encoding was successful already. To make MS Excel happy, move from UTF-8 to UTF-16. This should work:
I couldn't respond to Mark above, but I just made one modification which fixed the error which was caused if data in the cells was not unicode, i.e. float or int data. I replaced this line into the UnicodeWriter function: "self.writer.writerow([s.encode("utf-8") if type(s)==types.UnicodeType else s for s in row])" so that it became:
You will also need to "import types".