I am trying to solve a problem that is based on a 2D array. This array contains different kinds of elements (from a total of 3 possible kinds). Lets assume the kind as X, Y, Z.
The array appears to be something like this. Note that it would always be completely filled. The diagram is for illustration.
7 | | | | | | |
6 | | | | | | |
5 | | | | | | |
4 | |X|Z|Y|X| |
3 | |Y|X|Y|Y|X|
2 |Y|Y|X|Z|Z|X|
1 |X|X|Y| |X|X|
0 | | | |Z| | |
0 1 2 3 4 5
I am trying to create sets of elements that are placed adjacent to each other. For example, set1 may comprise of elements of type X located at: (0,1), (1,1), (2,2), (2,3), (1,4). Similarly, set2 may comprise of elements of type Y located at: (3,4), (3,3), 4,3).
Problem: Given any point in the array, it must be capable of adding all elements to the appropriate set and ensuring that there are no two sets that contain the same element. Note that a set is only created if more than 2 adjacent elements of the same kind are encountered.
Moreover, if a certain subset of elements is removed, more elements are added to replace the removed ones. The array must then be re-iterated over to make new sets or modify the existing ones.
Solution: I implemented a recursive solution such that it would iterate over all the adjacent elements of, for example, element X (0,1). Then, while iterating over the 8 possible adjacent elements, it would call itself recursively whenever a type X occurred.
This kind of solution is too much brute-force and inefficient, especially in the case where some elements are replaced with new ones of possibly different types. In such a case, almost the whole array has to be re-iterated to make/modify sets and ensuring that no same element exists in more than one set.
Is there any algorithm to deal efficiently with this kind of problem? I need help with some ideas/suggestions or pseudo codes.
I wrote something to find objects of just one type for another SO question. The example below adds two more types. Any re-iteration would examine the whole list again. The idea is to process the list of points for each type separately. The function
solve
groups any connected points and removes them from the list before enumerating the next group.areConnected
checks the relationship between the points' coordinates since we are only testing points of one type. In this generalized version, the types (a b c
) could be anything (strings, numbers, tuples, etc.), as long as they match.btw - here's a link to a JavaScript example of j_random_hacker's terrific algorithm: http://jsfiddle.net/groovy/fP5kP/
Haskell code:
Sample output:
In your situation, I would rely, at least, on two different arrays:
It might be possible to create more supporting arrays like, for example, one including the minimum/maximum X/Y values for each set to speed up the analysis (although it would be pretty quick anyway, as shown below).
You are not mentioning any programming language, but I include a sample (C#) code because it is the best way to explain the point. Please, don't understand it as a suggestion of the best way to proceed (personally, I don't like
Dictionaries
/Lists
too much; although think that do provide a good graphical way to show an algorithm, even for unexperienced C# users). This code only intends to show a data storage/retrieval approach; the best way to achieve the optimal performance would depend upon the target language and further issues (e.g., dataset size) and is something you have to take care of.Where
isSurroundingPoint
is a function checking whether both points are close one to the other:[EDIT 5/8/2013: Fixed time complexity. (O(a(n)) is essentially constant time!)]
In the following, by "connected component" I mean the set of all positions that are reachable from each other by a path that allows only horizontal, vertical or diagonal moves between neighbouring positions having the same kind of element. E.g. your example
{(0,1), (1,1), (2,2), (2,3), (1,4)}
is a connected component in your example input. Each position belongs to exactly one connected component.We will build a union/find data structure that will be used to give every position (x, y) a numeric "label" having the property that if and only if any two positions (x, y) and (x', y') belong to the same component then they have the same label. In particular this data structure supports three operations:
set(x, y, i)
will set the label for position (x, y) to i.find(x, y)
will return the label assigned to the position (x, y).union(Z)
, for some set of labels Z, will combine all labels in Z into a single label k, in the sense that future calls tofind(x, y)
on any position (x, y) that previously had a label in Z will now return k. (In general k will be one of the labels already in Z, though this is not actually important.)union(Z)
also returns the new "master" label, k.If there are n = width * height positions in total, this can be done in O(n*a(n)) time, where a() is the extremely slow-growing inverse Ackermann function. For all practical input sizes, this is the same as O(n).
Notice that whenever two vertices are adjacent to each other, there are four possible cases:
\
diagonal edge)/
diagonal edge)We can use the following pass to determine labels for each position (x, y):
After this, calling
find(x, y)
on any position (x, y) effectively tells you which component it belongs to. If you want to be able to quickly answer queries of the form "Which positions belong to the connected component containing position (x, y)?" then create a hashtable of listsposInComp
and make a second pass over the input array, appending each (x, y) to the listposInComp[find(x, y)]
. This can all be done in linear time and space. Now to answer a query for some given position (x, y), simply calllab = find(x, y)
to find that position's label, and then list the positions inposInComp[lab]
.To deal with "too-small" components, just look at the size of
posInComp[lab]
. If it's 1 or 2, then (x, y) does not belong to any "large-enough" component.Finally, all this work effectively takes linear time, so it will be lightning fast unless your input array is huge. So it's perfectly reasonable to recompute it from scratch after modifying the input array.
You may want to check out region growing algorithms, which are used for image segmentation. These algorithms start from a seed pixel and grow a contiguous region where all the pixels in the region have some property.
In your case adjacent 'pixels' are in the same image segment if they have the same label (ie, kind of element X, Y or Z)