How to fix: “UnicodeDecodeError: 'ascii' c

2018-12-31 07:37发布

as3:~/ngokevin-site# nano content/blog/20140114_test-chinese.mkd
as3:~/ngokevin-site# wok
Traceback (most recent call last):
File "/usr/local/bin/wok", line 4, in
Engine()
File "/usr/local/lib/python2.7/site-packages/wok/engine.py", line 104, in init
self.load_pages()
File "/usr/local/lib/python2.7/site-packages/wok/engine.py", line 238, in load_pages
p = Page.from_file(os.path.join(root, f), self.options, self, renderer)
File "/usr/local/lib/python2.7/site-packages/wok/page.py", line 111, in from_file
page.meta['content'] = page.renderer.render(page.original)
File "/usr/local/lib/python2.7/site-packages/wok/renderers.py", line 46, in render
return markdown(plain, Markdown.plugins)
File "/usr/local/lib/python2.7/site-packages/markdown/init.py", line 419, in markdown
return md.convert(text)
File "/usr/local/lib/python2.7/site-packages/markdown/init.py", line 281, in convert
source = unicode(source)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 1: ordinal not in range(128). -- Note: Markdown only accepts unicode input!

How to fix it?

In some other python-based static blog apps, Chinese post can be published successfully. Such as this app: http://github.com/vrypan/bucket3. In my site http://bc3.brite.biz/, Chinese post can be published successfully.

15条回答
流年柔荑漫光年
2楼-- · 2018-12-31 07:52

I had the same error, with URLs containing non-ascii chars (bytes with values > 128)

url = url.decode('utf8').encode('utf-8')

Note: utf-8, utf8 are simply aliases . Using only 'utf8' or 'utf-8' should work in the same way

In my case, worked for me, in Python 2.7, I suppose this assignment changed 'something' in the str internal representation--i.e., it forces the right decoding of the backed byte sequence in url and finally puts the string into a utf-8 str with all the magic in the right place. Unicode in Python is black magic for me. Hope useful

查看更多
千与千寻千般痛.
3楼-- · 2018-12-31 07:53

This error occurs when there are some non ASCII characters in our string and we are performing any operations on that string without proper decoding. This helped me solve my problem. I am reading a CSV file with columns ID,Text and decoding characters in it as below:

train_df = pd.read_csv("Example.csv")
train_data = train_df.values
for i in train_data:
    print("ID :" + i[0])
    text = i[1].decode("utf-8",errors="ignore").strip().lower()
    print("Text: " + text)
查看更多
不流泪的眼
4楼-- · 2018-12-31 07:54

In a Django (1.9.10)/Python 2.7.5 project I have frequent UnicodeDecodeError exceptions; mainly when I try to feed unicode strings to logging. I made a helper function for arbitrary objects to basically format to 8-bit ascii strings and replacing any characters not in the table to '?'. I think it's not the best solution but since the default encoding is ascii (and i don't want to change it) it will do:

def encode_for_logging(c, encoding='ascii'):
    if isinstance(c, basestring):
        return c.encode(encoding, 'replace')
    elif isinstance(c, Iterable):
        c_ = []
        for v in c:
            c_.append(encode_for_logging(v, encoding))
        return c_
    else:
        return encode_for_logging(unicode(c))
`

查看更多
孤独寂梦人
5楼-- · 2018-12-31 07:55

In short, to ensure proper unicode handling in Python 2:

  • use io.open for reading/writing files
  • use from __future__ import unicode_literals
  • configure other data inputs/outputs (e.g., databases, network) to use unicode
  • if you cannot configure outputs to utf-8, convert your output for them print(text.encode('ascii', 'replace').decode())

For explanations, see @Alastair McCormack's detailed answer.

查看更多
弹指情弦暗扣
6楼-- · 2018-12-31 07:58

I got the same problem with the string "Pastelería Mallorca" and I solved with:

unicode("Pastelería Mallorca", 'latin-1')
查看更多
泪湿衣
7楼-- · 2018-12-31 07:59

In some cases, when you check your default encoding (print sys.getdefaultencoding()), it returns that you are using ASCII. If you change to UTF-8, it doesn't work, depending on the content of your variable. I found another way:

import sys
reload(sys)  
sys.setdefaultencoding('Cp1252')
查看更多
登录 后发表回答