Using R (while using the packages rvest
,jsonlite
and httr
) am trying to programmatically download all the data files available at the following URL:
http://environment.data.gov.uk/ds/survey/index.jsp#/survey?grid=TQ38
I have tried to use Chrome and use "Inspect" and then Source
for the download options, but it appears to be using ng tables
and AngularJS
as a method to retrieve the final URL to download the dataset. The index.jsp
file seems to reference to a javascript file downloads/ea.downloads.js
which looks valuable, but am unsure how to find it to understand what functions I need to call.
Ideally the first result would be a data.frame
or data.table
with a column that has the Product and a column that has URLs of the files to be downloaded for each. This would be useful so that I can subsequently loop through the rows of the table and download each zip file.
I think this AngularJS issue is similar to this questions
But cannot workout how my code should be adjusted for this example.
I am sure there is a better solution. This is not a final solution but is a start. It appears the data you are looking for is stored in a JSON file associated with the main page. Once that file is downloaded, you can then process it in order to determine desired files to download.
The naming scheme from the site is confusing, I will leave that to the experts to determine the meaning.
A slight expansion on Dave2e's solution demonstrating how to get the XHR JSON resource with
splashr
:splashr
requires a Splash server and the pkg provides a way to start one with Docker. Read the help on the github pg and inside the pkg to find out how to use that.This retrieves all the resources loaded by the page:
This targets the background XHR resource with
catalogName
in it. You'd still need to hunt to find this initially, but once you know the pattern, this becomes a generic operation for other grid points.Read that in:
The rest is similar to the other answer:
Be kind to your network and their server:
Do this if you think your system and their server can handle 98 simultaneous 70-100MB file downloads