Removing non-ascii characters in a csv file

I am currently inserting data in my django models using csv file. Below is a simple save function that am using:

def save(self):
myfile = file.csv
data = csv.reader(myfile, delimiter=',', quotechar='"')
i=0
for row in data:
    if i == 0:
        i = i + 1
        continue    #skipping the header row        

    b=MyModel()
    b.create_from_csv_row(row) # calls a method to save in models

The function is working perfectly with ascii characters. However, if the csv file has some non-ascii characters then, an error is raised: UnicodeDecodeError 'ascii' codec can't decode byte 0x93 in position 1526: ordinal not in range(128)

My question is: How can i remove non-ascii characters before saving my csv file to avoid this error.

Thanks in advance.

标签： python django csv converter

3条回答

Root（大扎）

2楼-- · 2019-06-25 10:49

If you really want to strip it, try:

import unicodedata

unicodedata.normalize('NFKD', title).encode('ascii','ignore')

* WARNING THIS WILL MODIFY YOUR DATA * It attempts to find a close match - i.e. ć -> c

Perhaps a better answer is to use unicodecsv instead.

----- EDIT ----- Okay, if you don't care that the data is represented at all, try the following:

# If row references a unicode string
b.create_from_csv_row(row.encode('ascii', 'ignore'))

If row is a collection, not a unicode string, you will need to iterate over the collection to the string level to re-serialize it.

0人赞添加讨论(0) 举报

贪生不怕死

3楼-- · 2019-06-25 10:52

If you want to remove non-ascii characters from your data then iterate through your data and keep only the ascii.

for item in data:
     if ord(item) <= 128: # 1 - 128 is ascii
          [append,write,print,whatever]

If you want to convert unicode characters to ascii, then the response above by DivinusVox is accurate.

0人赞添加讨论(0) 举报

ら.Afraid

4楼-- · 2019-06-25 10:57

Pandas csv parser (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html) supports different encodings:

import pandas
data = pandas.read_csv(myfile, encoding='utf-8', quotechar='"', delimiter=',')

0人赞添加讨论(0) 举报

Removing non-ascii characters in a csv file

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间