I am trying to produce gridded rainfall data over the UK by using a Thin Plate Spline algorithm and eliminate values that are not over land in R - a process I can only achieve so far manually. The problem is challenging (for me) and even challenging to explain - so I will step through what I have done so far. Any help will be greatly welcome.
First, I load a data table into R that represents rainfall on a single day from a number of point location weather stations, and each row of the data table contains the date, id of the station, the easting and northing of the station, the daily rainfall at that site and the average rainfall for the year. I also load the libraries fields,maptools and gstat.
library(fields)
library(maptools)
library(gstat)
dat <- read.table("1961month1day1.csv", header=T, sep=",", quote = "")
names(dat) <- c("easting", "northing", "dailyrainfall","avaerageyearlyrainfall")
Here is a sample of the data:
dput(head(dat, 20))
structure(list(easting = c(130000L, 145000L, 155000L, 170000L,
180000L, 180000L, 180000L, 180000L, 185000L, 200000L, 200000L,
205000L, 210000L, 220000L, 225000L, 230000L, 230000L, 230000L,
230000L, 235000L), northing = c(660000L, 30000L, 735000L, 40000L,
30000L, 45000L, 60000L, 750000L, 725000L, 50000L, 845000L, 65000L,
770000L, 105000L, 670000L, 100000L, 620000L, 680000L, 95000L,
120000L), dailyrainfall = c(9.4, 4.1, 12.4, 2.8, 1.3, 3.6, 4.8, 26.7, 19.8,
4.6, 1.7, 4.1, 12.7, 1.8, 3, 5.3, 1, 1.5, 1.5, 4.6), averageyearlyrainfall = c(1334.626923,
1123.051923, 2072.030769, 1207.584615, 928, 1089.334615, 880.0884615,
2810.323077, 1933.719231, 1215.642308, 2644.171154, 1235.913462,
2140.111538, 1010.436538, 1778.432692, 1116.934615, 912.2807692,
1579.386538, 1085.498077, 1250.601923)), .Names = c("easting",
"northing", "dailyrainfall", "averageyearlyrainfall"), row.names = c(NA, 20L), class = "data.frame")
I can then fit a thin plate spline to the data so as to give me a gridded surface and plot the surface:
fit <- Tps(cbind(dat$easting,dat$northing),dat$dailyrainfall)
surface(fit)
I can then create a grid of the UK, in 1km steps by using:
xvals <- seq(0, 700000, by=1000)
yvals <- seq(0, 1250000, by=1000)
and then plot the surface onto this grid and write the data into a table:
griddf <- expand.grid(xvals, yvals)
griddf$pred <- predict(fit, x=as.matrix(griddf))
write.table(griddf, file="1Jan1961grid.csv", sep=",", qmethod="double")
Great - so far so good. I now have converted my point data to 1km gridded data over the entire 0 to 700000 (E) and 0 to 1250000 (N) grid. The written data table is a list containing an index, an easting, a northing and the predicted rainfall value.
Now the challenge - I want to eliminate any values from this list that are not over land. I can achieve this manually by loading the data into excel (or Access) and comparing the data to another file that contains the same grid and the average yearly rainfall (the file is called 1kmgridaveragerainfall.csv). Here is a sample of this file:
dput(head(dat1, 20))
structure(list(easting = c(-200000L, -200000L, -200000L, -200000L,
-200000L, -200000L, -200000L, -200000L, -200000L, -200000L, -200000L,
-200000L, -200000L, -200000L, -200000L, -200000L, -200000L, -200000L,
-200000L, -200000L), northing = c(1245000L, 1240000L, 1235000L,
1230000L, 1225000L, 1220000L, 1215000L, 1210000L, 1205000L, 1200000L,
1195000L, 1190000L, 1185000L, 1180000L, 1175000L, 1170000L, 1165000L,
1160000L, 1155000L, 1150000L), averageyearlyrainfall = c(-9999, -9999, -9999,
-9999, -9999, -9999, -9999, -9999, -9999, -9999, -9999, -9999,
-9999, -9999, -9999, -9999, -9999, -9999, -9999, -9999)), .Names = c("easting",
"northing", "averageyearlyrainfall"), row.names = c(NA, 20L), class = "data.frame")
Any grid square that is not over land has an average yearly rainfall of -9999. Hence once matched (i.e. using vlookup or a query in Access) I can filter out values that have this -9999 value and this leaves me with a data table that has the easting and northing and daily rainfall and average yearly rainfall for land values only. I can then load this back into R and plot this using:
quilt.plot(cbind(dat$easting,dat$northing),dat$mm, add.legend=TRUE, nx=654, ny=1209,xlim=c(0,700000),ylim=c(0,1200000))
and I am left with a plot of rainfall over the UK land (and not the sea area).
So, can anybody suggest a way of achieving the same but without all the filtering etc using excel or access i.e. can the same be achieved using R only? Is there a way of loading both data tables into R at the beginning and somehow fitting the TPS of the point data over the average data so that grid squares that are equal to -9999 are not plotted.
I know that the TPS can be weighted using a covariate (Z) - does this help at all? i.e.
fit <- Tps(cbind(dat$easting,dat$northing),dat$dailyrainfall, Z=dat$averageyearlyrainfall)
Also, when I perform surface(fit) of the original TPS, how do I extend the plot to the edges of the plot - I'm sure I've read this somewhere where you put something like interp=TRUE but this doesn't work.
Any help would be most appreciated
Thanks, Tony
If you have already got to the point where you have the two dataframes you should be able to merge them into a new dataframe and filter/subset the result.
This is the first data frame:
This is the second dataframe:
Merged dataframe:
Final dataframe with -9999 row removed:
Okay, we can't replicate your data so here's a few pointers with some samples:
First make a matrix with what is like your daily average rainfall data with -9999 marking non-land:
Then make a matrix that is your grid of values:
Now we want to replace all the values in
r
wherem
has an -9999 value withNA
:Now if you can translate that to your data objects, then its job done, right?