BeautifulSoup table data extraction - data not sho

2020-01-20 02:57发布

问题:

The problem that I am having is that the data needed is not showing up when running the Python code. It is visbile when I "inspect element" on Chrome but not "View Source".

My code:

import bs4 as bs
import urllib 
import urllib.request
url='https://ethplorer.io/address/0x8b353021189375591723e7384262f45709a3c3dc'
page=urllib.request.urlopen(url)
soup=bs.BeautifulSoup(page,'html.parser')

cat=0
for category in soup.findAll('td',{'class':'list-field'}):
    print(category)
    cat=cat+1

It pulls out the needed line

<td class="list-field" id="address-token-holdersCount"></td>

However it has a value for it, which is the 2345 as shown below.

When I check the page using "Inspect Element", the needed part looks like this:

<table class="table">
                            <tbody>
                            <tr class="even last">
                                <td>Holders</td>
                                <td id="address-token-holdersCount" 
                                   class="list-field">"2345"</td>
                            </tr>
                            </tbody>
                            </table>

What do you recommend to fix this issue?

回答1:

As you yourself found out, the element is not present in the page source, and is loaded dynamically through an AJAX request. The urllib module (or requests) returns the page source, which is why you won't be able to get that value directly.

Go to Developer Tools > Network > XHR and refresh the page. You'll see an AJAX request made to this url:

https://ethplorer.io/service/service.php?data=0x8b353021189375591723e7384262f45709a3c3dc

This url returns the data in the form of JSON. If you have a look at it, you can get the Holders number from it using requests module and the built-in .json() method.

import requests

r = requests.get('https://ethplorer.io/service/service.php?data=0x8b353021189375591723e7384262f45709a3c3dc')
data = r.json()

holders = data['pager']['holders']['total']
print(holders)
# 2346