How to group elements of a numpy array with the sa

2020-07-09 09:33发布

问题:

As usual intro, I am a tyro in python. However, I got quite a big project to code. It is a surface flow model with Cell Automata. Anyway, I also want to include building roofs in my model. Imagine you have an ascii file indicating buildings with 1s, while the rest is 0. There are just those two states. Now, I want to find all adjacent cells indicating the same building and store them (or rather the information of y,x and one more (maybe elevation),so 3 columns) in an individual building arrays. Keep in mind that buildings can have all possible forms though diagonally connected cells doesn't belong to the same building. So only northern, southern, western and eastern cells can belong to the same building.

I did my homework and googled it but so far I couldn't find a satisfying answer.

example: initial land-cover array:

([0,0,0,0,0,0,0]
 [0,0,1,0,0,0,0]
 [0,1,1,1,0,1,1]
 [0,1,0,1,0,0,1]
 [0,0,0,0,0,0,0])

output(I need to now the coordinates of the cells in my initial array):

 building_1=([1,2],[2,1],[2,2],[2,3],[3,1],[3,3])
 building_2=([2,5],[2,6],[3,6])

Any help is greatly appreciated!

回答1:

You can use the label function from scipy.ndimage to identify the distinct buildings.

Here's your example array, containing two buildings:

In [57]: a
Out[57]: 
array([[0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 0],
       [0, 1, 1, 1, 0, 1, 1],
       [0, 1, 0, 1, 0, 0, 1],
       [0, 0, 0, 0, 0, 0, 0]])

Import label.

In [58]: from scipy.ndimage import label

Apply label to a. It returns two values: the array of labeled positions, and the number of distinct objects (buildings, in this case) found.

In [59]: lbl, nlbls = label(a)

In [60]: lbl
Out[60]: 
array([[0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 0],
       [0, 1, 1, 1, 0, 2, 2],
       [0, 1, 0, 1, 0, 0, 2],
       [0, 0, 0, 0, 0, 0, 0]], dtype=int32)

In [61]: nlbls
Out[61]: 2

To get the coordinates of a building, np.where can be used. For example,

In [64]: np.where(lbl == 2)
Out[64]: (array([2, 2, 3]), array([5, 6, 6]))

It returns a tuple of arrays; the kth array holds the coordinates of the kth dimension. You can use, for example, np.column_stack to combine these into an array:

 In [65]: np.column_stack(np.where(lbl == 2))
 Out[65]: 
 array([[2, 5],
        [2, 6],
        [3, 6]])

You might want a list of all the coordinate arrays. Here's one way to create such a list.

For convenience, first create a list of labels:

In [66]: labels = range(1, nlbls+1)

In [67]: labels
Out[67]: [1, 2]

Use a list comprehension to create the list of coordinate arrays.

In [68]: coords = [np.column_stack(where(lbl == k)) for k in labels]

In [69]: coords
Out[69]: 
[array([[1, 2],
       [2, 1],
       [2, 2],
       [2, 3],
       [3, 1],
       [3, 3]]),
 array([[2, 5],
       [2, 6],
       [3, 6]])]

Now your building data is in labels and coords. For example, the first building was labeled labels[0], and its coordinates are in coords[0]:

In [70]: labels[0]
Out[70]: 1

In [71]: coords[0]
Out[71]: 
array([[1, 2],
       [2, 1],
       [2, 2],
       [2, 3],
       [3, 1],
       [3, 3]])


回答2:

Thank you for the great answers! Here is a little correction. If you see the landcover array, I actually don't have 0 as background information but -9999 (0 is too precious for GIS people). I forgot to mention that. But thanks to machine yearning's hint, I made a work-around by assigning all -9999 with 0 through landcover = np.where(landcover > -9999, landcover, 0). After that I can use label. The actual aim was to find the lowest cell and to assign it as outlet. If somebody has a more efficient way, please let me know!

import numpy as np
from scipy.ndimage import label

Original data set has -9999 as background information and 1 as building cells.

landcover = np.array([[-9999,-9999,-9999,-9999,-9999,-9999,1], 
                       [-9999,-9999,1,-9999,-9999,-9999,-9999],
                       [-9999,1,1,1,-9999,1,1], 
                       [-9999,1,-9999,1,-9999,-9999,1], 
                       [-9999,-9999,-9999,-9999,-9999,-9999,-9999]],dtype=int)

Here is a random digital elevation map.

DEM = np.array([[7,4,3,2,4,5,4], 
               [4,5,5,3,5,6,7],
               [2,6,4,7,4,4,4],
               [3,7,8,8,10,9,7],
               [2,5,7,7,9,10,8]],dtype=float)

I changed all -9999 entries to 0 in order to use label @thanks to machine yearning

 landcover = np.where(landcover > -9999, landcover, 0)

Then I labeled distinct buildings and counting those distinctions @Warren Weckesser, the rest pretty much yours. thanks!

 lbl, nlbls = label(landcover)
 bldg_number = range(1, nlbls+1)
 bldg_coord = [np.column_stack(where(lbl == k)) for k in bldg_no]
 outlets=np.zeros([nlbls,3])

I am iterating over the bldg_coord list in order to determine the lowest cells which will be assigned as outlet

 for i in range(0, nlbls):
     building=np.zeros([bldg_coord[i].shape[0],3])
     for j in range(0,bldg_coord[i].shape[0]):
         building[j][0]=bldg_coord[i][j][0]
         building[j][1]=bldg_coord[i][j][1]
         building[j][2]=DEM[bldg_coord[i][j][0],bldg_coord[i][j][1]]

I sort the building array in ascending order according to the DEM information of each building cell in order to find the lowest lying building cells.

  building=building[building[:,2].argsort()]

The lowest building cell will be used as roof outlet for rainwater

  outlets[i][0]=building[0][0]
  outlets[i][1]=building[0][1]
  outlets[i][2]=bldg_coord[i].shape[0]

Here is the output. The first two columns are indices in den landcover array and the last is the number of adjacent building cells.

>>> outlets
array([[ 0.,  6.,  1.],
       [ 2.,  2.,  6.],
       [ 2.,  5.,  3.]])


回答3:

It looks like this function does exactly what you're looking for (from the numpy documentation):

numpy.argwhere(a):

Find the indices of array elements that are non-zero, grouped by element.

>>> x = np.arange(6).reshape(2,3)
>>> x
array([[0, 1, 2],
       [3, 4, 5]])
>>> np.argwhere(x>1)
array([[0, 2],
       [1, 0],
       [1, 1],
       [1, 2]])

Alternatively it seems like your use case requires using the returned coordinates to index arrays.

The output of argwhere is not suitable for indexing arrays. For this purpose use where(a) instead.

You might want to try numpy.where instead.