I'd like to be able to screenscrape Morningstar webpages. Morningstar provides information about a mutual fund that I routinely look up but haven't been able to find elsewhere, ie
- total return compared against benchmark
- total return compared against peers
- percentile ranking
Here's an example: morningstar example
As a prelude to screenscraping, I need to be able to download the webpage with the desired content. Unfortunately, when I try using Java SE6 or wget to retrieve the above example link, I only get a portion of the html (the tables displaying the total return figures are absent). I get the same result, if I use my browser (Chrome), to save the page as html only. I notice that if I use my browser to save the complete page (html, js, css, and everything else) the downloaded html does contain the interesting information.
I have two questions:
- How can I programmatically download the entire html file? Though I'm writing this program in Java, I don't mind invoking an external tool.
- Why were my aforementioned attempts not yielding the HTML that I was expecting?
Thanks.
As a side note, I looked at Yahoo Finance and YQL/datatables as alternatives but that Yahoo Finance doesn't provide percentile rankings. If you look up the performance of a mutual fund, you'll see N/A values for the rankings. Yahoo Finance example. Unfortunately, this would preclude using YQL/datatables.
Regarding any questions of Morningstar's copyright, I'm screenscraping for personal, non commercial use, which their copyright notice allows in the last sentence of the second paragraph:
You are entitled to use the Information it contains for your private, non-commercial use only. Morningstar Copyright.