Suppose I have a NumPy structured array with various numeric datatypes. As a basic example,
my_data = np.array( [(17, 182.1), (19, 175.6)], dtype='i2,f4')
How can I cast this into a regular NumPy array of floats?
From this answer, I know I could use
np.array(my_data.tolist())
but apparently it is slow since you "convert an efficiently packed NumPy array to a regular Python list".
You can do it easily with Pandas:
>>> import pandas as pd
>>> pd.DataFrame(my_data).values
array([[ 17. , 182.1000061],
[ 19. , 175.6000061]], dtype=float32)
Here's one way (assuming my_data
is a one-dimensional structured array):
In [26]: my_data
Out[26]:
array([(17, 182.10000610351562), (19, 175.60000610351562)],
dtype=[('f0', '<i2'), ('f1', '<f4')])
In [27]: np.column_stack(my_data[name] for name in my_data.dtype.names)
Out[27]:
array([[ 17. , 182.1000061],
[ 19. , 175.6000061]], dtype=float32)
The obvious way works:
>>> my_data
array([(17, 182.10000610351562), (19, 175.60000610351562)],
dtype=[('f0', '<i2'), ('f1', '<f4')])
>>> n = len(my_data.dtype.names) # n == 2
>>> my_data.astype(','.join(['f4']*n))
array([(17.0, 182.10000610351562), (19.0, 175.60000610351562)],
dtype=[('f0', '<f4'), ('f1', '<f4')])
>>> my_data.astype(','.join(['f4']*n)).view('f4')
array([ 17. , 182.1000061, 19. , 175.6000061], dtype=float32)
>>> my_data.astype(','.join(['f4']*n)).view('f4').reshape(-1, n)
array([[ 17. , 182.1000061],
[ 19. , 175.6000061]], dtype=float32)
A variation on Warren's answer (which copies data by field):
x = np.empty((my_data.shape[0],len(my_data.dtype)),dtype='f4')
for i,n in enumerate(my_data.dtype.names):
x[:,i]=my_data[n]
Or you could iterate by row. r
is a tuple. It has to be converted to a list in order to fill a row of x
. With many rows and few fields this will be slower.
for i,r in enumerate(my_data):
x[i,:]=list(r)
It may be instructive to try x.data=r.data
, and get an error: AttributeError: not enough data for array
. x
data is a buffer with 4 floats. my_data
is a buffer with 2 tuples, each of which contains an int and a float (or sequence of [int float int float]). my_data.itemsize==6
. One way or other, the my_data
has to be converted to all floats, and the tuple grouping removed.
But using astype
as Jaime shows does work:
x.data=my_data.astype('f4,f4').data
In quick tests using a 1000 item array with 5 fields, copying field by field is just as fast as using astype
.