I keep getting:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 265-266: ordinal not in range(128)
when I try:
df.to_html("mypage.html")
here is a sample of how to reproduce the problem:
df = pd.DataFrame({"a": [u'Rue du Gu\xc3\xa9, 78120 Sonchamp'], "b": [u"some other thing"]})
df.to_html("mypage.html")
the list of elements in "a"
are of type "unicode"
.
when I want to export it to csv it works because you can do:
df.to_csv("myfile.csv", encoding="utf-8")
Your problem is in other code. Your sample code has a Unicode string that has been mis-decoded as
latin1
,Windows-1252
, or similar, since it has UTF-8 sequences in it. Here I undo the bad decoding and redecode as UTF-8, but you'll want to find where the wrong decode is being performed:The way it worked for me:
The issue is actually in using
df.to_html("mypage.html")
to save the HTML to a file directly. If instead you write the file yourself, you can avoid this encoding bug with pandas.You may also need to specify the character set in the head of the HTML for it to show up properly on certain browsers (HTML5 has UTF-8 as default):
<meta charset="UTF-8">
This was the only method that worked for me out of the several I've seen.
If you really need to keep the output to html, you could try cleaning the code in a numpy array before writing to_html.