Concise way to filter data in xarray

2020-07-07 02:45发布

问题:

I need to apply a very simple 'match statement' to the values in an xarray array:

  1. Where the value > 0, make 2
  2. Where the value == 0, make 0
  3. Where the value is NaN, make NaN

Here's my current solution. I'm using NaNs, .fillna, & type coercion in lieu of 2d indexing.

valid = date_by_items.notnull()
positive = date_by_items > 0
positive = positive * 2
result = positive.fillna(0.).where(valid)
result

This changes this:

In [20]: date_by_items = xr.DataArray(np.asarray((list(range(3)) * 10)).reshape(6,5), dims=('date','item'))
    ...: date_by_items
    ...: 
Out[20]: 
<xarray.DataArray (date: 6, item: 5)>
array([[0, 1, 2, 0, 1],
       [2, 0, 1, 2, 0],
       [1, 2, 0, 1, 2],
       [0, 1, 2, 0, 1],
       [2, 0, 1, 2, 0],
       [1, 2, 0, 1, 2]])
Coordinates:
  * date     (date) int64 0 1 2 3 4 5
  * item     (item) int64 0 1 2 3 4

... to this:

Out[22]: 
<xarray.DataArray (date: 6, item: 5)>
array([[ 0.,  2.,  2.,  0.,  2.],
       [ 2.,  0.,  2.,  2.,  0.],
       [ 2.,  2.,  0.,  2.,  2.],
       [ 0.,  2.,  2.,  0.,  2.],
       [ 2.,  0.,  2.,  2.,  0.],
       [ 2.,  2.,  0.,  2.,  2.]])
Coordinates:
  * date     (date) int64 0 1 2 3 4 5
  * item     (item) int64 0 1 2 3 4

While in pandas df[df>0] = 2 would be enough. Surely I'm doing something pedestrian and there's an terser way?

回答1:

xarray now supports .where(condition, other), so this is now valid:

result = date_by_items.where(date_by_items > 0, 2)


回答2:

If you are happy to load your data in-memory as a NumPy array, you can modify the DataArray values in place with NumPy:

date_by_items.values[date_by_items.values > 0] = 2

The cleanest way to handle this would be if xarray supported the other argument to where, but we haven't implemented that yet (hopefully soon -- the groundwork has been laid!). When that works, you'll be able to write date_by_items.where(date_by_items > 0, 2).

Either way, you'll need to do this twice to apply both your criteria.