Encoding problem downloading HTML using mechanize

2020-03-24 10:46发布

browser = mechanize.Browser()
page = browser.open(url)
html = page.get_data()

print html

It shows some strange characters. I suppose that it is UTF-8 string but Python doesn't know that and cannot show it properly.

How can I convert this string to unicode string like

u = u'test'

标签： python unicode encoding utf-8 mechanize

3条回答

祖国的老花朵

2楼-- · 2020-03-24 11:07

u = html.decode('utf-8')

0人赞添加讨论(0) 举报

Juvenile、少年°

3楼-- · 2020-03-24 11:18

you need to define the encoding like :

#!/usr/bin/python
# -*- coding: iso-8859-15 -*-

mechanize need it .

for more information check this out http://www.python.org/dev/peps/pep-0263/

0人赞添加讨论(0) 举报

手持菜刀，她持情操

4楼-- · 2020-03-24 11:27

It was gzipped

def ungzipResponse(r,b):
    headers = r.info()
    if headers['Content-Encoding']=='gzip':
        import gzip
        gz = gzip.GzipFile(fileobj=r, mode='rb')
        html = gz.read()
        gz.close()
        headers["Content-type"] = "text/html; charset=utf-8"
        r.set_data( html )
        b.set_response(r)

response = browser.open(url)
ungzipResponse(response, browser)
html = response.read()

0人赞添加讨论(0) 举报

Encoding problem downloading HTML using mechanize

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间