Python Beautiful Soup parsing a UTF-8 coded table

2019-09-01 10:31发布

问题:

I'm trying to parse the following table, coded in UTF-8 (this is part of it):

<table cellspacing="0" cellpadding="3" border="0" id="ctl00_SPWebPartManager1_g_c001c0d9_0cb8_4b0f_b75a_7cc3b6f7d790_ctl00_HistoryData1_gridHistoryData_DataGrid1" style="width:100%;border-collapse:collapse;">
                            <tr class="gridHeader" valign="top">
                                <td class="titleGridRegNoB" align="center" valign="top"><span dir=RTL>שווי שוק (אלפי ש"ח)</span></td><td class="titleGridReg" align="center" valign="top">הון רשום למסחר</td><td class="titleGridReg" align="center" valign="top">שער נמוך</td><td class="titleGridReg" align="center" valign="top">שער גבוה</td><td class="titleGridReg" align="center" valign="top">שער בסיס</td><td class="titleGridReg" align="center" valign="top">שער פתיחה</td><td class="titleGridReg" align="center" valign="top"><span dir="rtl">שער נעילה (באגורות)</span>    
</td><td class="titleGridReg" align="center" valign="top">שער נעילה מתואם</td><td class="titleGridReg" align="center" valign="top">תאריך</td>
                            </tr><tr onmouseover="this.style.backgroundColor='#FDF1D7'" onmouseout="this.style.backgroundColor='#ffffff'">

My code is:

html = br.response().read().decode('utf-8')
soup = BeautifulSoup(html)

table_id = "ctl00_SPWebPartManager1_g_c001c0d9_0cb8_4b0f_b75a_7cc3b6f7d790_ctl00_HistoryData1_gridHistoryData_DataGrid1"
table = soup.findall("table", id=table_id)

And I'm getting the following error:

TypeError: 'NoneType' object is not callable

回答1:

Since you are just finding using an id, you can just use id and nothing else, because ids are unique:

UPDATE

Using your paste:

# encoding=utf-8
from bs4 import BeautifulSoup
import requests

data = requests.get('https://dpaste.de/EWCK/raw/')
soup = BeautifulSoup(data.text)
print soup.find("table",
                id="ctl00_SPWebPartManager1_g_c001c0d9_0cb8_4b0f_b75a_7cc3b6f7d790_ctl00_HistoryData1_gridHistoryData_DataGrid1")

I'm using python requests to get the data from a webpage, its same as as you trying to get the data. The above code works, and the correct ID is given. Try this for a change, don't use .decode('utf-8'), instead, just use br.response().read().