I have several 2D numpy arrays (matrix) and for each one I would like to convert it to vector containing the values of the array and a vector containing each row/column index.
For example I might have an array like this:
x = np.array([[3, 1, 4],
[1, 5, 9],
[2, 6, 5]])
and I basically want the values
[3, 1, 4, 1, 5, 9, 2, 6, 5]
and their position
[[0,0], [0,1], [0,2], [1,0], [1,1], [1,2], [2,0], [2,1], [2,2]]
My end goal is to put these into a pandas DataFrame as columns like this:
V | x | y
--+---+---
3 | 0 | 0
1 | 0 | 1
4 | 0 | 2
1 | 1 | 0
5 | 1 | 1
9 | 1 | 2
6 | 2 | 0
5 | 2 | 1
3 | 2 | 2
where V is the value, x is the row position (index), and y is the column position (index).
I think I can hack something together but I'm trying to find the efficient way of doing this rather than fumbling around. For example I know I can get the values using something like x.reshape(x.size, 1)
and that I could try to create the index columns from x.shape
, but there seems like there should be a better way.
I don't know if it's most efficient, but numpy.meshgrid
is designed for this:
x = np.array([[3, 1, 4],
[1, 5, 9],
[2, 6, 5]])
XX,YY = np.meshgrid(np.arange(x.shape[1]),np.arange(x.shape[0]))
table = np.vstack((x.ravel(),XX.ravel(),YY.ravel())).T
print table
This produces:
[[3 0 0]
[1 1 0]
[4 2 0]
[1 0 1]
[5 1 1]
[9 2 1]
[2 0 2]
[6 1 2]
[5 2 2]]
Then I think df = pandas.DataFrame(table)
will give you your desired data frame.
You could also let pandas do the work for you since you'll be using it in a dataframe:
x = np.array([[3, 1, 4],
[1, 5, 9],
[2, 6, 5]])
df=pd.DataFrame(x)
#unstack the y columns so that they become an index then reset the
#index so that indexes become columns.
df=df.unstack().reset_index()
df
level_0 level_1 0
0 0 0 3
1 0 1 1
2 0 2 2
3 1 0 1
4 1 1 5
5 1 2 6
6 2 0 4
7 2 1 9
8 2 2 5
#name the columns and switch the column order
df.columns=['x','y','V']
cols = df.columns.tolist()
cols = cols[-1:] + cols[:-1]
df = df[cols]
df
V x y
0 3 0 0
1 1 0 1
2 2 0 2
3 1 1 0
4 5 1 1
5 6 1 2
6 4 2 0
7 9 2 1
8 5 2 2
Another way:
arr = np.array([[3, 1, 4],
[1, 5, 9],
[2, 6, 5]])
# build out rows array
x = np.arange(arr.shape[0]).reshape(arr.shape[0],1).repeat(arr.shape[1],axis=1)
# build out columns array
y = np.arange(arr.shape[1]).reshape(1,arr.shape[0]).repeat(arr.shape[0],axis=0)
# combine into table
table = np.vstack((arr.reshape(arr.size),x.reshape(arr.size),y.reshape(arr.size))).T
print table
You can simply use loops.
x = np.array([[3, 1, 4],
[1, 5, 9],
[2, 6, 5]])
values = []
coordinates = []
data_frame = []
for v in xrange(len(x)):
for h in xrange(len(x[v])):
values.append(x[v][h])
coordinates.append((h, v))
data_frame.append(x[v][h], h, v)
print '%s | %s | %s' % (x[v][h], v, h)
You can try this using itertools
import itertools
import numpy as np
import pandas as pd
def convert2dataframe(array):
a, b = array.shape
x, y = zip(*list(itertools.product(range(a), range(b))))
df = pd.DataFrame(data={'V':array.ravel(), 'x':x, 'y':y})
return df
This works for arrays of any shape, not necessarily square matrices.
I am resurrecting this because I think I know a different answer that is way easier to understand. Here is how I do it:
xn = np.zeros((np.size(x), np.ndim(x)+1), dtype=np.float32)
row = 0
for ind, data in np.ndenumerate(x):
xn[row, 0] = data
xn[row, 1:] = np.asarray(ind)
row += 1
In xn
we have
[[ 3. 0. 0.]
[ 1. 0. 1.]
[ 4. 0. 2.]
[ 1. 1. 0.]
[ 5. 1. 1.]
[ 9. 1. 2.]
[ 2. 2. 0.]
[ 6. 2. 1.]
[ 5. 2. 2.]]