How to slice a numpy.ndarray made up of numpy.void

2020-04-17 05:14发布

问题:

So here's the deal: I have variable x which is a numpy.ndarray. The size of this structure is 1000. If I do x[0], then I get a numpy.void, of 4 numbers. If I do x[1], then I get another numpy.void, also of 4 numbers, etc.

What I simply want to do: I want to slice this data structure, so that I extract a numpy matrix, of size 1000x3.

How do I do that? Thanks

回答1:

Sounds like you have a structured array, something like this simple example:

In [158]: x = np.ones((5,), dtype='i,i,f,f')
In [159]: x
Out[159]: 
array([(1, 1,  1.,  1.), (1, 1,  1.,  1.), (1, 1,  1.,  1.),
       (1, 1,  1.,  1.), (1, 1,  1.,  1.)], 
      dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<f4'), ('f3', '<f4')])
In [160]: x[0]
Out[160]: (1, 1,  1.,  1.)
In [161]: type(x[0])
Out[161]: numpy.void

x[0] is a record, displayed as a tuple. You access fields by name (not by 'column' index):

In [162]: x['f0']
Out[162]: array([1, 1, 1, 1, 1], dtype=int32)
In [163]: x['f2'] = np.arange(5)

In [165]: x['f1'] = [10,12,8,0,3]
In [166]: x
Out[166]: 
array([(1, 10,  0.,  1.), (1, 12,  1.,  1.), (1,  8,  2.,  1.),
       (1,  0,  3.,  1.), (1,  3,  4.,  1.)], 
      dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<f4'), ('f3', '<f4')])
In [168]: x[['f2','f3']]    # 2 fields at once
Out[168]: 
array([( 0.,  1.), ( 1.,  1.), ( 2.,  1.), ( 3.,  1.), ( 4.,  1.)], 
      dtype=[('f2', '<f4'), ('f3', '<f4')])

This is handy when 'columns' should contain different things, for example strings in one, integers in another. But it can be awkward to convert such an array to a 2d array of the same numeric type.

view and astype work in limited cases, but tolist is the most robust conversion medium that I'm aware of.

In [179]: x.tolist()
Out[179]: 
[(1, 10, 0.0, 1.0),
 (1, 12, 1.0, 1.0),
 (1, 8, 2.0, 1.0),
 (1, 0, 3.0, 1.0),
 (1, 3, 4.0, 1.0)]
In [180]: np.array(x.tolist())
Out[180]: 
array([[  1.,  10.,   0.,   1.],
       [  1.,  12.,   1.,   1.],
       [  1.,   8.,   2.,   1.],
       [  1.,   0.,   3.,   1.],
       [  1.,   3.,   4.,   1.]])

Note that the tolist for the structured array is a list of tuples, whereas tolist for a 2d array is a list of lists. Going this direction that difference doesn't matter. Going the other way the difference matters.

How did you generate this array? From a csv with genfromtxt? As output from some other numeric package?