Flatten numpy array but also keep index of value p

2020-06-19 19:13发布

问题:

I have several 2D numpy arrays (matrix) and for each one I would like to convert it to vector containing the values of the array and a vector containing each row/column index.

For example I might have an array like this:

x = np.array([[3, 1, 4],
              [1, 5, 9],
              [2, 6, 5]])

and I basically want the values

[3, 1, 4, 1, 5, 9, 2, 6, 5]

and their position

[[0,0], [0,1], [0,2], [1,0], [1,1], [1,2], [2,0], [2,1], [2,2]]

My end goal is to put these into a pandas DataFrame as columns like this:

V | x | y
--+---+---
3 | 0 | 0
1 | 0 | 1
4 | 0 | 2
1 | 1 | 0
5 | 1 | 1
9 | 1 | 2
6 | 2 | 0
5 | 2 | 1
3 | 2 | 2

where V is the value, x is the row position (index), and y is the column position (index).

I think I can hack something together but I'm trying to find the efficient way of doing this rather than fumbling around. For example I know I can get the values using something like x.reshape(x.size, 1) and that I could try to create the index columns from x.shape, but there seems like there should be a better way.

回答1:

I don't know if it's most efficient, but numpy.meshgrid is designed for this:

x = np.array([[3, 1, 4],
              [1, 5, 9],
              [2, 6, 5]])
XX,YY = np.meshgrid(np.arange(x.shape[1]),np.arange(x.shape[0]))
table = np.vstack((x.ravel(),XX.ravel(),YY.ravel())).T
print table

This produces:

[[3 0 0]
 [1 1 0]
 [4 2 0]
 [1 0 1]
 [5 1 1]
 [9 2 1]
 [2 0 2]
 [6 1 2]
 [5 2 2]]

Then I think df = pandas.DataFrame(table) will give you your desired data frame.



回答2:

You could also let pandas do the work for you since you'll be using it in a dataframe:

x = np.array([[3, 1, 4],
              [1, 5, 9],
              [2, 6, 5]])
df=pd.DataFrame(x)
#unstack the y columns so that they become an index then reset the
#index so that indexes become columns.
df=df.unstack().reset_index()
df

   level_0  level_1  0
0        0        0  3
1        0        1  1
2        0        2  2
3        1        0  1
4        1        1  5
5        1        2  6
6        2        0  4
7        2        1  9
8        2        2  5

#name the columns and switch the column order
df.columns=['x','y','V']
cols = df.columns.tolist()
cols = cols[-1:] + cols[:-1]
df = df[cols]
df

   V  x  y
0  3  0  0
1  1  0  1
2  2  0  2
3  1  1  0
4  5  1  1
5  6  1  2
6  4  2  0
7  9  2  1
8  5  2  2


回答3:

Another way:

arr = np.array([[3, 1, 4],
                [1, 5, 9],
                [2, 6, 5]])

# build out rows array
x = np.arange(arr.shape[0]).reshape(arr.shape[0],1).repeat(arr.shape[1],axis=1)
# build out columns array
y = np.arange(arr.shape[1]).reshape(1,arr.shape[0]).repeat(arr.shape[0],axis=0)

# combine into table
table = np.vstack((arr.reshape(arr.size),x.reshape(arr.size),y.reshape(arr.size))).T
print table


回答4:

You can simply use loops.

x = np.array([[3, 1, 4],
              [1, 5, 9],
              [2, 6, 5]])
values = []
coordinates = []
data_frame = []
for v in xrange(len(x)):
    for h in xrange(len(x[v])):
        values.append(x[v][h])
        coordinates.append((h, v))
        data_frame.append(x[v][h], h, v)
        print '%s | %s | %s' % (x[v][h], v, h)


回答5:

You can try this using itertools

import itertools
import numpy as np
import pandas as pd

def convert2dataframe(array):
    a, b = array.shape
    x, y = zip(*list(itertools.product(range(a), range(b))))
    df = pd.DataFrame(data={'V':array.ravel(), 'x':x, 'y':y})
    return df

This works for arrays of any shape, not necessarily square matrices.



回答6:

I am resurrecting this because I think I know a different answer that is way easier to understand. Here is how I do it:

xn = np.zeros((np.size(x), np.ndim(x)+1), dtype=np.float32)
row = 0
for ind, data in np.ndenumerate(x):
    xn[row, 0] = data
    xn[row, 1:] = np.asarray(ind)
    row += 1

In xn we have

[[ 3.  0.  0.]
 [ 1.  0.  1.]
 [ 4.  0.  2.]
 [ 1.  1.  0.]
 [ 5.  1.  1.]
 [ 9.  1.  2.]
 [ 2.  2.  0.]
 [ 6.  2.  1.]
 [ 5.  2.  2.]]