matplotlib.mlab.griddata very slow and returns arr

2020-06-04 13:12发布

问题:

I am trying to map an irregularly gridded dataset (raw satellite data) with associated latitudes and longitudes to a regularly gridded set of latitudes and longitudes given by basemap.makegrid(). I am using matplotlib.mlab.griddata with mpl_toolkits.natgrid installed. Below is a list of the variables being used as output by whos in ipython and some stats on the variables:

Variable   Type       Data/Info
-------------------------------
datalat    ndarray    666x1081: 719946 elems, type `float32`, 2879784 bytes (2 Mb)
datalon    ndarray    666x1081: 719946 elems, type `float32`, 2879784 bytes (2 Mb)
gridlat    ndarray    1200x1000: 1200000 elems, type `float64`, 9600000 bytes (9 Mb)
gridlon    ndarray    1200x1000: 1200000 elems, type `float64`, 9600000 bytes (9 Mb)
var        ndarray    666x1081: 719946 elems, type `float32`, 2879784 bytes (2 Mb)

In [11]: var.min()
Out[11]: -30.0

In [12]: var.max()
Out[12]: 30.0

In [13]: datalat.min()
Out[13]: 27.339874

In [14]: datalat.max()
Out[14]: 47.05302

In [15]: datalon.min()
Out[15]: -137.55658

In [16]: datalon.max()
Out[16]: -108.41629

In [17]: gridlat.min()
Out[17]: 30.394031556984299

In [18]: gridlat.max()
Out[18]: 44.237140350357713

In [19]: gridlon.min()
Out[19]: -136.17646180595321

In [20]: gridlon.max()
Out[20]: -113.82353819404671

datalat and datalon are the orignal data coordinates

gridlat and gridlon are the coordinates to interpolate to

var contains the actual data

Using these variables, when I call griddata(datalon, datalat, var, gridlon, gridlat) it has taken as long as 20 minutes to complete and returns an array of nan. From looking at the data, the latitudes and longitudes appear to be correct with the original coordinates overlapping a portion of the new area and a few data points lying outside of the new area. Does anyone have any suggestions? The nan values suggest that I'm doing something stupid...

回答1:

It looks like the mlab.griddata routine may introduce additional constraints on your output data that may not be necessary. While the input locations may be anything, the output locations must be a regular grid - since your example is in lat/lon space, your choice of map projection may violate this (i.e. regular grid in x/y is not a regular grid in lat/lon).

You might try the interpolate.griddata routine from SciPy as an alternative - you'll need to combine your location variables into a single array, though, since the call signature is different: something like

import scipy.interpolate
data_locations = np.vstack(datalon.ravel(), datalat.ravel()).T
grid_locations = np.vstack(gridlon.ravel(), gridlat.ravel()).T
grid_data      = scipy.interpolate.griddata(data_locations, val.ravel(),
                                            grid_locations, method='nearest')

for nearest-neighbor interpolation. This gets the locations into an array with 2 columns corresponding to your 2 dimensions. You may also want to perform the interpolation in the transformed space of your map projection.



回答2:

More than likely, griddata is way too hard. It's designed to work with randomly sampled data. Your data is almost certainly regularly sampled -- just not on the same grid as your target output grid.

Look at a much simpler approach like an affine transformation or a series of affine transformations on small chips if the earth's topology or curvature affect yoru results.

There are some out of the box solutions that might help. GDAL is a good example.

Also, this type of issue is often discussed in GIS. See:

https://gis.stackexchange.com/questions/10430/changing-image-projection-using-python



回答3:

If your data is on a grid such that data point at point (datalon[i], datalat[j]) is in data[i,j], then you can use scipy.interpolate.RectBivariateSpline instead of griddata. Some geography-specific libraries may offer more functionality, though.



回答4:

If you use pclormesh, you don't have to do any sort of interpolation. pcolormesh would gladly accept the data structure the way you have given here:

from mpl_toolkits.basemap import Basemap
m = Basemap(-----)
x,y = m(datalon, datalat)
m.pcolormesh(x,y,var)
plt.show()

kindly use this and tell me if this works or not.

However, there is some problem in pcolormesh when there is overlap of the orbit data. Please refer to this question of mine, you may find something useful.

Using pcolormesh for plotting an orbit data