How can I find the maximum across all variables co

2019-07-21 01:54发布

问题:

I have an xarray of daily data with a number of variables. I want to extract the maximum q_routed every year and the corresponding values of other variables on the day that the maximum q_routed happens.

    <xarray.Dataset>
    Dimensions:    (latitude: 1, longitude: 1, param_set: 1, time: 17167)
    Coordinates:
      * time       (time) datetime64[ns] 1970-01-01 ...
      * latitude   (latitude) float32 44.5118
      * longitude  (longitude) float32 -111.435
      * param_set  (param_set) |S1 b''
    Data variables:
        ppt        (time, param_set, latitude, longitude) float64 ...
        pet        (time, param_set, latitude, longitude) float64 ...
        obsq       (time, param_set, latitude, longitude) float64 ...
        q_routed   (time, param_set, latitude, longitude) float64 ...

The command below gives me the maximum of every variable in a year, but that's not what I want.

ncdat['q_routed'].groupby('time.year').max( )

Trial

I tried this

ncdat.groupby('time.year').argmax('time')

which leads to this error:

ValueError: All-NaN slice encountered

How can I do this?

回答1:

For this sort of operation, you probably want to use a custom reduce function:

def my_func(ds, dim=None):
    return ds.isel(**{dim: ds['q_routed'].argmax(dim)})


new = ncdat.groupby('time.year').apply(my_func, dim='time')

Now, argmax doesn't play nice when you have a full array of nans, so you may want to either only apply this function to locations with data or pre-fill the existing nans. Something like this could work:

mask = ncdat['q_routed'].isel(time=0).notnull()  # determine where you have valid data

ncdat2 = ncdat.fillna(-9999)  # fill nans with a missing flag of some kind
new = ncdat2.groupby('time.year').apply(my_func, dim='time').where(mask)  # do the groupby operation/reduction and reapply the mask