How to change the coding for python array?

2019-08-04 07:43发布

I use the following code to scrape a table from a Chinese website. It works fine. But it seems that the contents I stored in the list are not shown properly.

import requests
from bs4 import BeautifulSoup
import pandas as pd

x = requests.get('http://www.sohu.com/a/79780904_126549')
bs = BeautifulSoup(x.text,'lxml')

clg_list = []

for tr in bs.find_all('tr'):
    tds = tr.find_all('td')
    for i in range(len(tds)):
       clg_list.append(tds[i].text)
       print(tds[i].text)

When I print the text, it shows Chinese characters. But when I print out the list, it's showing \u4e00\u671f\uff0834\u6240\uff09'. I am not sure if I should change the encoding or something else is wrong.

标签： python python-2.7 web-scraping character-encoding beautifulsoup

1条回答

贼婆χ

2楼-- · 2019-08-04 07:52

There is nothing wrong in this case.

When you print a python list, python calls repr on each of the list's elements. In python2, the repr of a unicode string shows the unicode code points for the characters that make up the string.

>>> c = clg_list[0]
>>> c # Ask the interpreter to display the repr of c
u'\u201c985\u201d\u5de5\u7a0b\u5927\u5b66\u540d\u5355\uff08\u622a\u6b62\u52302011\u5e743\u670831\u65e5\uff09'

However, if you print the string, python encodes the unicode string with a text encoding (for example, utf-8) and your computer displays the characters that match the encoding.

>>> print c
“985”工程大学名单（截止到2011年3月31日）

Note that in python3 printing the list will show the chinese characters as you expect, because of python3's better unicode handling.

0人赞添加讨论(0) 举报

How to change the coding for python array?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间