I'm trying to scrape data from the public site asx.com.au
The page http://www.asx.com.au/asx/research/company.do#!/ACB/details contains a div
with class 'view-content', which has the information I need:
But when I try to view this page via Python's urllib2.urlopen
that div is empty:
import urllib2
from bs4 import BeautifulSoup
url = 'http://www.asx.com.au/asx/research/company.do#!/ACB/details'
page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page, "html.parser")
contentDiv = soup.find("div", {"class": "view-content"})
print(contentDiv)
# the results is an empty div:
# <div class="view-content" ui-view=""></div>
Is it possible to access the contents of that div programmatically?
Edit: as per the comment it appears that the content is rendered via Angular.js
. Is it possible to trigger the rendering of that content via Python?
This page use JavaScript to read data from server and fill page.
I see you use developer tools in chrome - see in tab "Network" on "XHR" or "JS" requests.
I found this url
http://data.asx.com.au/data/1/company/ACB?fields=primary_share,latest_annual_reports,last_dividend,primary_share.indices&callback=angular.callbacks._0
This url gives all data almost in JSON format
But if you use this link without
&callback=angular.callbacks._0
then you get data in pure JSON format and you will could usejson
module to convert it to python dictionary.EDIT: working code
Output: