BeautifulSoup “encode(”utf-8\")

2019-05-27 03:22发布

问题:

from bs4 import BeautifulSoup   
import urllib.request    

link = ('https://mywebsite.org')  
req = urllib.request.Request(link, headers={'User-Agent': 'Mozilla/5.0'})
url = urllib.request.urlopen(req).read()

soup =  BeautifulSoup(url, "html.parser")  
body = soup.find_all('div', {"class":"wrapper"})

print(body)

Hi guys, I have a problem with this code. If I run it it come the error

UnicodeEncodeError: 'charmap' codec can't encode character '\u2022' in position 138: character maps to

I tryed to search and I found that I had to add

.encode("utf-8")

but if I add it come the error

AttributeError: 'ResultSet' object has no attribute 'encode'

How I can resolve this?

I'm sorry for my english but I'm italian :)

回答1:

You're on Windows and trying to print to the console. The print() is throwing the exception.

The Windows console only natively supports 8bit code pages, so anything outside of your region will break (despite what people say about chcp 65001).

You need to install and use https://github.com/Drekin/win-unicode-console. This module talks at a low-level to the console API, giving support for multi-byte characters.

Alternatively, don't print to the console and write your output to a file, opened with an encoding. For example:

with open("myoutput.log", "w", encoding="utf-8") as my_log:
    my_log.write(body)