I have a raster time series stored in multiple GeoTIFF
files (*.tif
) that I'd like to convert to a single NetCDF
file. The data is uint16
.
I could probably use gdal_translate
to convert each image to netcdf using:
gdal_translate -of netcdf -co FORMAT=NC4 20150520_0164.tif foo.nc
and then some scripting with NCO
to extract dates from filenames and then concatenate, but I was wondering whether I might do this more effectively in Python using xarray
and it's new rasterio
backend.
I can read a file easily:
import glob
import xarray as xr
f = glob.glob('*.tif')
da = xr.open_rasterio(f[0])
da
which returns
<xarray.DataArray (band: 1, y: 5490, x: 5490)>
[30140100 values with dtype=uint16]
Coordinates:
* band (band) int64 1
* y (y) float64 5e+05 5e+05 5e+05 5e+05 5e+05 4.999e+05 4.999e+05 ...
* x (x) float64 8e+05 8e+05 8e+05 8e+05 8.001e+05 8.001e+05 ...
Attributes:
crs: +init=epsg:32620
and I can write one of these to NetCDF:
ds.to_netcdf('foo.nc')
but ideally I would be able to use something like xr.open_mfdataset
, write the time values (extracted from the filenames) and then write the entire aggregation to netCDF
. And have dask
handle the out-of-core memory issues. :-)
Can something like this be done with xarray
and dask
?