I'm unable to export one of my dataframes due to some encoding difficulty.
sjM.dtypes
Customer Name object
Total Sales float64
Sales Rank float64
Visit_Frequency float64
Last_Sale datetime64[ns]
dtype: object
csv export works fine
path = 'c:\\test'
sjM.to_csv(path + '.csv') # Works
but the excel export fails
sjM.to_excel(path + '.xls')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "testing.py", line 338, in <module>
sjM.to_excel(path + '.xls')
File "c:\Anaconda\Lib\site-packages\pandas\core\frame.py", line 1197, in to_excel
excel_writer.save()
File "c:\Anaconda\Lib\site-packages\pandas\io\excel.py", line 595, in save
return self.book.save(self.path)
File "c:\Anaconda\Lib\site-packages\xlwt\Workbook.py", line 662, in save
doc.save(filename, self.get_biff_data())
File "c:\Anaconda\Lib\site-packages\xlwt\Workbook.py", line 637, in get_biff_data
shared_str_table = self.__sst_rec()
File "c:\Anaconda\Lib\site-packages\xlwt\Workbook.py", line 599, in __sst_rec
return self.__sst.get_biff_record()
File "c:\Anaconda\Lib\site-packages\xlwt\BIFFRecords.py", line 76, in get_biff_record
self._add_to_sst(s)
File "c:\Anaconda\Lib\site-packages\xlwt\BIFFRecords.py", line 91, in _add_to_sst
u_str = upack2(s, self.encoding)
File "c:\Anaconda\Lib\site-packages\xlwt\UnicodeUtils.py", line 50, in upack2
us = unicode(s, encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x81 in position 22: ordinal not in range(128)
I know that the problem is coming from the 'Customer Name' column, as after deletion the export to excel works fine.
I've tried following advice from that question (Python pandas to_excel 'utf8' codec can't decode byte), using a function to decode and re-encode the offending column
def changeencode(data):
cols = data.columns
for col in cols:
if data[col].dtype == 'O':
data[col] = data[col].str.decode('latin-1').str.encode('utf-8')
return data
sJM = changeencode(sjM)
sjM['Customer Name'].str.decode('utf-8')
L2-00864 SETIA 2
K1-00279 BERKAT JAYA
L2-00664 TK. ANTO
BR00035 BRASIL JAYA,TK
RA00011 CV. RAHAYU SENTOSA
so the conversion to unicode appears to be successful
sjM.to_excel(path + '.xls')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\Anaconda\Lib\site-packages\pandas\core\frame.py", line 1197, in to_excel
excel_writer.save()
File "c:\Anaconda\Lib\site-packages\pandas\io\excel.py", line 595, in save
return self.book.save(self.path)
File "c:\Anaconda\Lib\site-packages\xlwt\Workbook.py", line 662, in save
doc.save(filename, self.get_biff_data())
File "c:\Anaconda\Lib\site-packages\xlwt\Workbook.py", line 637, in get_biff_data
shared_str_table = self.__sst_rec()
File "c:\Anaconda\Lib\site-packages\xlwt\Workbook.py", line 599, in __sst_rec
return self.__sst.get_biff_record()
File "c:\Anaconda\Lib\site-packages\xlwt\BIFFRecords.py", line 76, in get_biff_record
self._add_to_sst(s)
File "c:\Anaconda\Lib\site-packages\xlwt\BIFFRecords.py", line 91, in _add_to_sst
u_str = upack2(s, self.encoding)
File "c:\Anaconda\Lib\site-packages\xlwt\UnicodeUtils.py", line 50, in upack2
us = unicode(s, encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 22: ordinal not in range(128)
- Why does it fails, even though the conversion to unicode appears to be successful ?
- How can i work around this issue to export that dataframe to excel ?
@Jeff
Thanks for showing me the right direction
steps used :
install xlsxwriter (not bundled with pandas)
sjM.to_excel(path + '.xlsx', sheet_name='Sheet1', engine='xlsxwriter')
You need to use pandas >= 0.13, and the
xlsxwriter
engine for excel, which supports native unicode writing.xlwt
, the default engine will support passing an encoding option will be available in 0.14.see here for the engine docs.