Python - Kriging (Gaussian Process) in scikit_lear

2019-04-29 02:11发布

问题:

I am considering using this method to interpolate some 3D points I have. As an input I have atmospheric concentrations of a gas at various elevations over an area. The data I have appears as values every few feet of vertical elevation for several tens of feet, but horizontally separated by many hundreds of feet (so 'columns' of tightly packed values).

The assumption is that values vary in the vertical direction significantly more than in the horizontal direction at any given point in time.

I want to perform 3D kriging with that assumption accounted for (as a parameter I can adjust or that is statistically defined - either/or).

I believe the scikit learn module can do this. If it can, my question is how do I create a discrete cell output? That is, output into a 3D grid of data with dimensions of, say, 50 x 50 x 1 feet. Ideally, I would like an output of [x_location, y_location, value] with separation of those (or similar) distances.

Unfortunately I don't have a lot of time to play around with it, so I'm just hoping to figure out if this is possible in Python before delving into it. Thanks!

回答1:

Yes, you can definitely do that in scikit_learn.

In fact, it is a basic feature of kriging/Gaussian process regression that you can use anisotropic covariance kernels.

As it is precised in the manual (cited below) ou can either set the parameters of the covariance yourself or estimate them. And you can choose either having all parameters equal or all different.

theta0 : double array_like, optional An array with shape (n_features, ) or (1, ). The parameters in the autocorrelation model. If thetaL and thetaU are also specified, theta0 is considered as the starting point for the maximum likelihood estimation of the best set of parameters. Default assumes isotropic autocorrelation model with theta0 = 1e-1.



回答2:

In the 2d case, something like this should work:

import numpy as np
from sklearn.gaussian_process import GaussianProcess

x = np.arange(1,51)
y = np.arange(1,51)
X, Y = np.meshgrid(lons, lats)

points = zip(obs_x,  obs_y)
values = obs_data    # Replace with your observed data

gp = GaussianProcess(theta0=0.1, thetaL=.001, thetaU=1., nugget=0.001)
gp.fit(points, values)
XY_pairs = np.column_stack([X.flatten(), Y.flatten()])
predicted = gp.predict(XY_pairs).reshape(X.shape)