I have a similar problem to the one mentioned here but none of the suggested methods work for me.
I have a medium size utf-8
.csv file with a lot of non-ascii characters.
I am splitting the file by a particular value from one of the columns, and then I'd like to save each of the obtained dataframes as an .xlsx file with the characters preserved.
This doesn't work, as I am getting an error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 7: ordinal not in range(128)
Here is what I tried:
- Using
xlsxwriter
engine explicitly. This doesn't seem to change anything. Defining a function (below) to change encoding and throw away bad characters. This also doesn't change anything.
def changeencode(data): cols = data.columns for col in cols: if data[col].dtype == 'O': data[col] = data[col].str.decode('utf-8').str.encode('ascii', 'ignore') return data
Changing by hand all the offensive chars to some others. Still no effect (the quoted error was obtained after this change).
Encoding the file as
utf-16
(which, I believe, is the correct encoding since I want to be able to manipulate the file from within the excel afterwards) doesn't help either.
I believe that the problem is in the file itself (because of 2 and 3) but I have no idea how to get around it. I'd appreciate any help. The beginning of the file is pasted below.
"Submitted","your-name","youremail","phone","miasto","cityCF","innemiasto","languagesCF","morelanguages","wiek","partnerCF","messageCF","acceptance-795","Submitted Login","Submitted From","2015-12-25 14:07:58 +00:00","Zózia kryś","test@tes.pl","4444444","Wrocław","","testujemy polskie znaki","Polski","testujemy polskie znaki","44","test","test","1","Justyna","99.111.155.132",
EDIT
Some code (one of the versions, without the splitting part):
import pandas as pd
import string
import xlsxwriter
df = pd.read_csv('path-to-file.csv')
with pd.ExcelWriter ('test.xlsx') as writer:
df.to_excel(writer, sheet_name = 'sheet1',engine='xlsxwriter')
Supposedly this was a bug in the version of
pandas
which I was using back then. Right now, inpandas
ver. 0.19.2, the code below saves the csv from the question without any trouble (and with correct encoding).NB:
openpyxl
module have to be installed on your system.Try encoding the columns with non-ascii characters as
and then save the file to xlsx format with encoding 'utf8'
What if you save the csv files from pandas and then use win32com to convert to Excel. It would look something like this...