Reshape list of unequal length lists into numpy ar

2019-03-05 22:51发布

I have a specific array with dtype = object, the array elements represent couples of coordinates at different times and I want to reshape it into an easier format. I managed to do this for "one time", but I can't get it to work for all time observations.

The length of each observation is different so perhaps I must use masked values to do that. Below is an example that I hope explains better what I want.

# My "input" is:
a = np.array([[], [(2, 0), (2, 2)], [(2, 2), (2, 0), (2, 1), (2, 2)]], dtype=object)

#And my "output" is:

#holding_array_VBPnegl
array([[2, 0],
       [2, 2],
       [2, 1]])

#It doesnt consider my for loop in a.shape[0], so the expected result is :
test = np.array([[[True, True],
       [True, True],
       [True, True]],

       [[2, 0],
       [2, 2],
       [True, True]]

       [[2, 0],
       [2, 2],
       [2, 1]]])

#with "True" the masked values

I have tried using code I found on StackOverflow:

import numpy as np

holding_list_VBPnegl=[]
for i in range(a.shape[0]):
    for x in a[i]:
        if x in holding_list_VBPnegl:
            pass
        else:
            holding_list_VBPnegl.append(x)

print holding_list_VBPnegl
holding_array_VBPnegl = np.asarray(holding_list_VBPnegl)

1条回答
甜甜的少女心
2楼-- · 2019-03-05 23:25

Numpy arrays are ideally used for blocks of contiguous memory, so you'll first need to preallocate the required amount of memory. You can get this from the length of your array a (which I'll gladly cast to a list - don't abuse numpy arrays for storing unequal length lists) (you refer to the observations as a sequence of timesteps, yes?) and the length of the longest observation (in this case 4, a's last element).

import numpy as np
a = np.array([[], [(2, 0), (2, 2)], [(2, 2), (2, 0), (2, 1), (2, 2)]], dtype=object)

s = a.tolist()  # Lists are a better container type for your data...
cols = len(s)
rows = max( len(l) for l in s)

m = np.ones((cols, rows, 2))*np.nan

Now you've preallocated what you need and set the array ready for masking. You only need to fill the array now with the data you already have:

for rowind, row in enumerate(s):
    try:
        m[rowind, :len(row),:] = np.array(row)
    except ValueError:
        pass  # broadcasting error: row is empty

result = np.ma.masked_array(m.astype(np.int), mask=np.isnan(m))
result
masked_array(data =
 [[[-- --]
  [-- --]
  [-- --]
  [-- --]]

 [[2 0]
  [2 2]
  [-- --]
  [-- --]]

 [[2 2]
  [2 0]
  [2 1]
  [2 2]]],
             mask =
 [[[ True  True]
  [ True  True]
  [ True  True]
  [ True  True]]

 [[False False]
  [False False]
  [ True  True]
  [ True  True]]

 [[False False]
  [False False]
  [False False]
  [False False]]],
       fill_value = 999999)
查看更多
登录 后发表回答