I want to scrap DIV content created by javascript function by using python script. I have tried with BS4 and by doing with that i'm not able to get dynamic data. instead it shows only the source code.
Sample code:
import requests
from bs4 import BeautifulSoup
URL = "https://rawgit.com/skysoft999/tableauJS/master/example.html"
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html5lib')
for row in soup.findAll('div', attrs = {'class':'quote'}):
print(row)
print(soup.prettify())
Sample HTML source code is in Pastebin
Sample data to be extracted:
The initial HTML does not contain the data you want to scrape, that's why using only
BeautifulSoup
is not enough. You can load the page withSelenium
and then scrape the content.Code:
Output:
The code assumes that the button is initially disabled:
<button id="getData" onclick="getUnderlyingData()" disabled>Get Data</button>
and data is not loaded automatically, but due to the button being clicked. Therefore you need to delete this line:setTimeout(function(){ getUnderlyingData(); }, 3000);
.You can find a working demo of your example here: http://demo-tableau.bitballoon.com/.