I am using urlfetch to fetch a URL. When I try to send it to html2text function (strips off all HTML tags), I get the following message:
UnicodeEncodeError: 'charmap' codec can't encode characters in position ... character maps to <undefined>
I've been trying to process encode('UTF-8','ignore') on the string but I keep getting this error.
Any ideas?
Thanks,
Joel
Some Code:
result = urlfetch.fetch(url="http://www.google.com")
html2text(result.content.encode('utf-8', 'ignore'))
And the error message:
File "C:\Python26\lib\encodings\cp1252.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 159-165: character maps to <undefined>
You need to decode the data you fetched first! With which codec? Depends on the website you fetch.
When you have unicode and try to encode it with
some_unicode.encode('utf-8', 'ignore')
i can't image how it could throw an error.Ok what you need to do:
This is not really robust but it should show you the way.