My problem is the following,
I have a 2D numpy array filled with 0 an 1, with an absorbing boundary condition (all the outer elements are 0) , for example:
[[0 0 0 0 0 0 0 0 0 0]
[0 0 1 0 0 0 0 0 0 0]
[0 0 1 0 1 0 0 0 1 0]
[0 0 0 0 0 0 1 0 1 0]
[0 0 0 0 0 0 1 0 0 0]
[0 0 0 0 1 0 1 0 0 0]
[0 0 0 0 0 1 1 0 0 0]
[0 0 0 1 0 1 0 0 0 0]
[0 0 0 0 1 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]]
I want to create a function that takes this array and its linear dimension L as input parameters, (in this case L = 10) and returns the list of cluster sizes of this array.
By "clusters" I mean the isolated groups of elements 1 of the array
the array element [ i ][ j ] is isolated if all its neighbours are zeros, and its neighbours are the elements:
[i+1][j]
[i-1][j]
[i][j+1]
[i][j-1]
So in the previous array we have 7 clusters of sizes (2,1,2,6,1,1,1)
I tried to complete this task by creating two functions, the first one is a recursive function:
def clust_size(array,i,j):
count = 0
if array[i][j] == 1:
array[i][j] = 0
if array[i-1][j] == 1:
count += 1
array[i-1][j] = 0
clust_size(array,i-1,j)
elif array[i][j-1] == 1:
count += 1
array[i-1][j] = 0
clust_size(array,i,j-1)
elif array[i+1][j] == 1:
count += 1
array[i-1][j] = 0
clust_size(array,i+1,j)
elif array[i][j+1] == 1:
count += 1
array[i-1][j] = 0
clust_size(array,i,j+1)
return count+1
and it should return the size of one cluster. Everytime the function finds an array element equal to 1 it increases the value of the counter "count" and changes the value of the element to 0, in this way each '1' element it's counted just one time. If one of the neighbours of the element is equal to 1 then the function calls itself on that element.
The second function is:
def clust_list(array,L):
sizes_list = []
for i in range(1,L-1):
for i in range(1,L-1):
count = clust_size(array,i,j)
sizes_list.append(count)
return sizes_list
and it should return the list containing the cluster sizes. The for loop iterates from 1 to L-1 because all the outer elements are 0.
This doesn't work and I can't see where the error is...
I was wondering if maybe there's an easier way to do it.
it seems like a percolation problem. The following link has your answer if you have scipy installed.
http://dragly.org/2013/03/25/working-with-percolation-clusters-in-python/
This will identify the clusters:
The following will calculate their area.
This gives what you expect, although I would think that you would have a cluster with 8 members by eye-percolation.
I feel your problem with finding "clusters", is essentially the same problem of finding connected components in a binary image (with values of either 0 or 1) based on 4-connectivity. You can see several algorithms to identify the connected components (or "clusters" as you defined them) in this Wikipedia page:
http://en.wikipedia.org/wiki/Connected-component_labeling
Once the connected components or "clusters" are labelled, you can find any information you want easily, including the area, relative position or any other information you may want.
I believe that your way ist almost correct, except that you are initializing the variable
count
over and over again whenever you recursively call your functionclust_size
. I would add the count variable to the input parameters ofclust_size
and just reinitialize it for every first call in your nestedfor
loops withcount = 0
.Like this, you would call
clust_size
always likecount=clust_size(array, i ,j, count)
I haven't tested it but it seems to me that it should work.Hope it helps.
A relatively simple problem if you convert this to strings
output