Python 2 and 3 csv reader

I'm trying to use the csv module to read a utf-8 csv file, and I have some trouble to create a generic code for python 2 and 3 due to encoding.

Here is the original code in Python 2.7:

with open(filename, 'rb') as csvfile:
    csv_reader = csv.reader(csvfile, quotechar='\"')
    langs = next(csv_reader)[1:]
    for row in csv_reader:
        pass

But when I run it with python 3, it doesn't like the fact that I open the file without "encoding". I tried this:

with codecs.open(filename, 'r', encoding='utf-8') as csvfile:
    csv_reader = csv.reader(csvfile, quotechar='\"')
    langs = next(csv_reader)[1:]
    for row in csv_reader:
        pass

Now python 2 can't decode the line in the "for" loop. So... how should I do it ?

标签： python encoding csv python-3.x

3条回答

做自己的国王

2楼-- · 2019-04-22 12:46

Update: While the code in my original answer works I meanwhile release a small package at https://pypi.python.org/pypi/csv342 that provides a Python 3 like interface for Python 2. So independent of your Python version you can simply do an

import csv342 as csv
import io
with io.open('some.csv', 'r', encoding='utf-8', newline='') as csv_file:
    for row in csv.reader(csv_file, delimiter='|'):
        print(row)

Original answer: Here's a solution that even with Python 2 actually decodes the text to Unicode strings and consequently works with encodings other than UTF-8.

The code below defines a function csv_rows() that returns the contents of a file as sequence of lists. Example usage:

for row in csv_rows('some.csv', encoding='iso-8859-15', delimiter='|'):
    print(row)

Here are the two variants for csv_rows(): one for Python 3+ and another for Python 2.6+. During runtime it automatically picks the proper variant. UTF8Recoder and UnicodeReader are verbatim copies of the examples in the Python 2.7 library documentation.

import csv
import io
import sys


if sys.version_info[0] >= 3:
    # Python 3 variant.
    def csv_rows(csv_path, encoding, **keywords):
        with io.open(csv_path, 'r', newline='', encoding=encoding) as csv_file:
            for row in csv.reader(csv_file, **keywords):
                yield row

else:
    # Python 2 variant.
    import codecs

    class UTF8Recoder:
        """
        Iterator that reads an encoded stream and reencodes the input to UTF-8
        """
        def __init__(self, f, encoding):
            self.reader = codecs.getreader(encoding)(f)

        def __iter__(self):
            return self

        def next(self):
            return self.reader.next().encode("utf-8")


    class UnicodeReader:
        """
        A CSV reader which will iterate over lines in the CSV file "f",
        which is encoded in the given encoding.
        """

        def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
            f = UTF8Recoder(f, encoding)
            self.reader = csv.reader(f, dialect=dialect, **kwds)

        def next(self):
            row = self.reader.next()
            return [unicode(s, "utf-8") for s in row]

        def __iter__(self):
            return self


    def csv_rows(csv_path, encoding, **kwds):
        with io.open(csv_path, 'rb') as csv_file:
            for row in UnicodeReader(csv_file, encoding=encoding, **kwds):
                yield row

0人赞添加讨论(0) 举报

Root（大扎）

3楼-- · 2019-04-22 12:50

Indeed, in Python 2 the file should be opened in binary mode, but in Python 3 in text mode. Also in Python 3 newline='' should be specified (which you forgot).

You'll have to do the file opening in an if-block.

import sys

if sys.version_info[0] < 3: 
    infile = open(filename, 'rb')
else:
    infile = open(filename, 'r', newline='', encoding='utf8')


with infile as csvfile:
    ...

0人赞添加讨论(0) 举报

叼着烟拽天下

4楼-- · 2019-04-22 12:53

Old Question I know, but I was looking on how to do this. Just in case someone comes over this and might find it useful.

This is how i solved mine, thanks Lennart Regebro for the hint. :

if sys.version > '3':
       rd = csv.reader(open(input_file, 'r', newline='',
       encoding='iso8859-1'), delimiter=';', quotechar='"')
else:
       rd = csv.reader(open(input_file, 'rb'), delimiter=';',
       quotechar='"')

then do what you need to do:

for row in rd:
       ......

0人赞添加讨论(0) 举报

Python 2 and 3 csv reader

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间