browser = mechanize.Browser()
page = browser.open(url)
html = page.get_data()
print html
It shows some strange characters. I suppose that it is UTF-8 string but Python doesn't know that and cannot show it properly.
How can I convert this string to unicode string like
u = u'test'
It was gzipped
def ungzipResponse(r,b):
headers = r.info()
if headers['Content-Encoding']=='gzip':
import gzip
gz = gzip.GzipFile(fileobj=r, mode='rb')
html = gz.read()
gz.close()
headers["Content-type"] = "text/html; charset=utf-8"
r.set_data( html )
b.set_response(r)
response = browser.open(url)
ungzipResponse(response, browser)
html = response.read()
you need to define the encoding
like :
#!/usr/bin/python
# -*- coding: iso-8859-15 -*-
mechanize need it .
for more information check this out
http://www.python.org/dev/peps/pep-0263/