I am having a bit of trouble in coding a process or a script that would do the following:
I need to get data from the URL of:
nomads.ncep.noaa.gov/dods/gfs_hd/gfs_hd20140430/gfs_hd_00z
But the file URL's (the days and model runs change), so it has to assume this base structure for variables.
Y - Year
M - Month
D - Day
C - Model Forecast/Initialization Hour
F- Model Frame Hour
Like so:
nomads.ncep.noaa.gov/dods/gfs_hd/gfs_hdYYYYMMDD/gfs_hd_CCz
This script would run, and then import that date (in the YYYYMMDD, as well as CC) with those variables coded -
So while the mission is to get
http://nomads.ncep.noaa.gov/dods/gfs_hd/gfs_hd20140430/gfs_hd_00z
While these variables correspond to get the current dates in the format of:
http://nomads.ncep.noaa.gov/dods/gfs_hd/gfs_hdYYYYMMDD/gfs_hd_CCz
Can you please advise how to go about and get the URL's to find the latest date in this format? Whether it'd be a script or something with wget, I'm all ears. Thank you in advance.
The easiest solution would be just to mirror the parent directory:
However, if you just want the latest date, you can use
Mojo::UserAgent
as demonstrated onMojocast Episode 5
On May 23rd, 2014, Outputs:
In
Python
, therequests
library can be used to get at the URLs.You can generate the URL using a combination of the base URL string plus generating the timestamps using the
datetime
class and itstimedelta
method in combination with itsstrftime
method to generate the date in the format required.i.e. start by getting the current time with
datetime.datetime.now()
and then in a loop subtract an hour (or whichever time gradient you think they're using) viatimedelta
and keep checking the URL with therequests
library. The first one you see that's there is the latest one, and you can then do whatever further processing you need to do with it.If you need to scrape the contents of the page,
scrapy
works well for that.I'd try scraping the index one level up at http://nomads.ncep.noaa.gov/dods/gfs_hd ; the last link-of-particular-form there should take you to the daily downloads pages, where you could do something similar.
Here's an outline of scraping the daily downloads page:
and scraping the page with the last thirty would, of course, be very similar.