Algorithm to find peaks in 2D array

2019-04-04 19:01发布

问题:

Let's say I have a 2D accumulator array in java int[][] array. The array could look like this:

(x and z axes represent indexes in the array, y axis represents values - these are images of an int[56][56] with values from 0 ~ 4500)

or

What I need to do is find peaks in the array - there are 2 peaks in the first one and 8 peaks in the second array. These peaks are always 'obvious' (there's always a gap between peaks), but they don't have to be similar like on these images, they can be more or less random - these images are not based on the real data, just samples. The real array can have size like 5000x5000 with peaks from thousands to several hundred thousands... The algorithm has to be universal, I don't know how big the array or peaks can be, I also don't know how many peaks there are. But I do know some sort of threshold - that the peaks can't be smaller than a given value.

The problem is, that one peak can consist of several smaller peaks nearby (first image), the height can be quite random and also the size can be significantly different within one array (size - I mean the number of units it takes in the array - one peak can consist from 6 units and other from 90). It also has to be fast (all done in 1 iteration), the array can be really big.

Any help is appreciated - I don't expect code from you, just the right idea :) Thanks!


edit: You asked about the domain - but it's quite complicated and imho it can't help with the problem. It's actually an array of ArrayLists with 3D points, like ArrayList< Point3D >[][] and the value in question is the size of the ArrayList. Each peak contains points that belong to one cluster (plane, in this case) - this array is a result of an algorithm, that segments a pointcloud . I need to find the highest value in the peak so I can fit the points from the 'biggest' arraylist to a plane, compute some parameters from it and than properly cluster most of the points from the peak.

回答1:

He's not interested in estimating the global maximum using some sort of optimization heuristic - he just wants to find the maximum values within each of a number of separate clusters.

These peaks are always 'obvious' (there's always a gap between peaks)

Based on your images, I assume you mean there's always some 0-values separating clusters? If that's the case, you can use a simple flood-fill to identify the clusters. You can also keep track of each cluster's maximum while doing the flood-fill, so you both identify the clusters and find their maximum simultaneously.

This is also as fast as you can get, without relying on heuristics (which could return the wrong answer), since the maximum of each cluster could potentially be any value in the cluster, so you have to check them all at least once.


Note that this will iterate through every item in the array. This is also necessary, since (from the information you've given us) it's potentially possible for any single item in the array to be its own cluster (which would also make it a peak). With around 25 million items in the array, this should only take a few seconds on a modern computer.



回答2:

This might not be an optimal solution, but since the problem sounds somewhat fluid too, I'll write it down.

  1. Construct a list of all the values (and coordinates) that are over your minimum treshold.
  2. Sort it in descending order of height.
  3. The first element will be the biggest peak, add it to the peak list.
  4. Then descend down the list, if the current element is further than the minimum distance from all the existing peaks, add it to the peak list.

This is a linear description but all the steps (except 3) can be trivially parallelised. In step 4 you can also use a coverage map: a 2D array of booleans that show which coordinates have been "covered" by a nearby peak.

(Caveat emptor: once you refine the criteria, this solution might become completely unfeasible, but in general it works.)



回答3:

Simulated annealing, or hill climbing are what immediately comes to mind. These algorithms though will not guarantee that all peaks are found.

However if your "peaks" are separated by values of 0 as the gap, maybe a connected components analysis would help. You would label a region as "connected" if it is connected with values greater than 0(or if you have a certain threshold, label regions as connected that are over that threshold), then your number of components would be your number of peaks. You could also then do another pass of the array to find the max of each component.

I should note that connected components can be done in linear time, and finding the peak values can also be done in linear time.