Python numpy recarray: Can one obtain a view into

2019-07-17 00:10发布

问题:

I have a numpy structured array of the following form:

x = np.array([(1,2,3)]*2, [('t', np.int16), ('x', np.int8), ('y', np.int8)])

I now want to generate views into this array that team up 't' with either 'x' or 'y'. The usual syntax creates a copy:

v_copy = x[['t', 'y']]
v_copy
#array([(1, 3), (1, 3)], 
#     dtype=[('t', '<i2'), ('y', '|i1')])

v_copy.base is None
#True

This is not unexpected, since picking two fields is "fancy indexing", at which point numpy gives up and makes a copy. Since my actual records are large, I want to avoid the copy at all costs.

It is not at all true that the required elements cannot be accessed within numpy's strided memory model. Looking at the individual bytes in memory:

x.view(np.int8)
#array([1, 0, 2, 3, 1, 0, 2, 3], dtype=int8)

one can figure out the necessary strides:

v = np.recarray((2,2), [('b', np.int8)], buf=x, strides=(4,3))
v
#rec.array([[(1,), (3,)],
#    [(1,), (3,)]], 
#    dtype=[('b', '|i1')])
v.base is x
#True

Clearly, v points to the correct locations in memory without having created a copy. Unfortunately, numpy won't allow me to reinterpret these memory locations as the original data types:

v_view = v.view([('t', np.int16), ('y', np.int8)])
#ValueError: new type not compatible with array.

Is there a way to trick numpy into doing this cast, so that an array v_view equivalent to v_copy is created, but without having made a copy? Perhaps working directly on v.__array_interface__, as is done in np.lib.stride_tricks.as_strided()?

回答1:

You can construct a suitable dtype like so

dt2 = np.dtype(dict(names=('t', 'x'), formats=(np.int16, np.int8), offsets=(0, 2)))

and then do

y = np.recarray(x.shape, buf=x, strides=x.strides, dtype=dt2)

In future Numpy versions (> 1.6), you can also do

dt2 = np.dtype(dict(names=('t', 'x'), formats=(np.int16, np.int8), offsets=(0, 2), itemsize=4))
y = x.view(dt2)


回答2:

This works with numpy 1.6.x and avoids creating a recarray:

dt2 = {'t': (np.int16, 0), 'y': (np.int8, 3)}
v_view = np.ndarray(x.shape, dtype=dt2, buffer=x, strides=x.strides)
v_view
#array([(1, 3), (1, 3)], 
#    dtype=[('t', '<i2'), ('', '|V1'), ('y', '|i1')])
v_view.base is x
#True

One can wrap this in a class overloading np.ndarray:

class arrayview(np.ndarray):
    def __new__(subtype, x, fields):
        dtype = {f: x.dtype.fields[f] for f in fields}
        return np.ndarray.__new__(subtype, x.shape, dtype,
                                  buffer=x, strides=x.strides)

v_view = arrayview(x, ('t', 'y'))
v_view
#arrayview([(1, 3), (1, 3)], 
#    dtype=[('t', '<i2'), ('', '|V1'), ('y', '|i1')])
v_view.base is x
#True