R create NetCDF from .CSV

2019-06-03 17:59发布

问题:

I am trying to create a NetCDF from a .csv file. I have read several tutorials here and other places and still have some doubts.

I have a table according to this:

lat,long,time,rh,temp
41,-109,6,1,1
40,-107,18,2,2
39,-105,6,3,3
41,-103,18,4,4
40,-109,6,5,2
39,-107,18,6,4

I create the NetCDF using the ncdf4 package in R.

xvals <- data$lon
yvals <- data$lat 
nx <- length(xvals)
ny <- length(yvals)
lon1 <- ncdim_def("longitude", "degrees_east", xvals)
lat2 <- ncdim_def("latitude", "degrees_north", yvals)
time <- data$time
mv <- -999 #missing value to use

var_temp <- ncvar_def("temperatura", "celsius", list(lon1, lat2, time), longname="Temp. da superfície", mv) 

var_rh <- ncvar_def("humidade", "%", list(lon1, lat2, time), longname = "humidade relativa", mv )

ncnew <- nc_create(filename, list(var_temp, var_rh))
ncvar_put(ncnew, var_temp, dadostemp, start=c(1,1,1), count=c(nx,ny,nt))

When I follow the procedure it states that the NC expects 3 times the number of data that I have. I understand why, one matrix for each dimension, since I stated that the variables are according to the Longitude, Latitude and Time.

So, how would I import this kind of data, where I already have one Lon, Lat, Time and other variables for each data acquisition?

Could someone shed some light?

PS: The data used here is not my real data, just some example I was using for the tutorials.

回答1:

I think there is more than one problem in your code. Step by step:

Create dimensions

In a nc file dimensions don't work as key-values there just a vector of values defining what each position in a variable array means. This means you should create your dimensions like this:

xvals <- unique(data$lon)
xvals <- xvals[order(xvals)]
yvals <- yvals[order(unique(data$lat))] 
lon1 <- ncdim_def("longitude", "degrees_east", xvals)
lat2 <- ncdim_def("latitude", "degrees_north", yvals)
time <- data$time
time_d <- ncdim_def("time","h",unique(time))

Where I work we use unlimited dimensions as mere indexes while a 1d-variable with same name as the dimension holds the values. I'm not sure how unlimited dimensions work in R. Since you don't ask for it I leave this out :-)

define variables

mv <- -999 #missing value to use
var_temp <- ncvar_def("temperatura", "celsius", 
                      list(lon1, lat2, time_d), 
                      longname="Temp. da superfície", mv) 
var_rh <- ncvar_def("humidade", "%", 
                     list(lon1, lat2, time_d), 
                     longname = "humidade relativa", mv )

add data

Create an nc file: ncnew <- nc_create(f, list(var_temp, var_rh))

When adding values the object holding the data is molten to a 1d-array and a sequential write is started at the position specified by start. The dimension to write along is controlled by the values in count. If you have data like this:

long, lat, time, t
   1,   1,    1, 1
   2,   1,    1, 2
   1,   2,    1, 3
   2,   2,    1, 4

The command ncvar_put(ncnew, var_temp,data$t,count=c(2,2,1)) would give you what you (probably) expect.

For you're data the first step is to create the indexes for the dimensions:

data$idx_lon <- match(data$long,xvals)
data$idx_lat <- match(data$lat,yvals)
data$idx_time <- match(data$time,unique(time))

Then create an array with the dimensions appropriate for your data:

m <- array(mv,dim = c(length(yvals),length(xvals),length(unique(time))))

Then fill the array with you're values:

for(i in 1:NROW(data)){
  m[data$idx_lat[i],data$idx_lon[i],data$idx_time[i]] <- data$temp[i]
}

if speed is a concern you could calculate the linear index vectorised and use this for value assignment.

Write the data

ncvar_put(ncnew, var_temp,m)

Note that you don't need start and count.

Finally close the nc file to write data to the disk nc_close(ncnew) Optionally I would recommend you the ncdump console command to check your file.

Edit

Regarding your question whether to write a complete array or use start and count I believe both methods work reliable. Which one to prefer depends on your data and you're personal preferences.

I think the method of building an array, add the values and then write it as whole is easier to understand. However, when asking what is more efficient it depends on the data. If you're data is big and has many NA values I believe using multiple writes with start and count could be faster. If NA's are rare creating one matrix and do single write would be faster. If you're data is so big creating an extra array would exceed you're available memory you have to combine both methods.



标签: r gis netcdf