I have a partially ordered set, say A = [x1, x2, ...]
, meaning that for each xi
and xj
in the set, (exactly) one of four possibilities is true: xi < xj
, xi == xj
, xi > xj
, or xi
and xj
are incomparable.
I want to find the maximal elements (i.e., those elements xi
for which there are no elements xj
with xi < xj
). What is an efficient algorithm to do this (minimize the number of comparisons)? I tried building a DAG and doing a topological sort, but just building the graph requires O(n^2) comparisons, which is too many.
I'm doing this in Python, but if you don't know it I can read other languages, or pseudocode.
It seems the worst case is O(n^2) no matter what you do. For example, if no elements are comparable, then you need to compare every element to every other element in order to determine that they are all maximal.
And if you allow O(n^2), since the ordering is transitive, you can just make one pass through the set, keeping a list of all elements that are maximal so far; each new element knocks out any maximal elements that are < it and gets added to the maximal list if it is not < any maximal element.
In the worst case, you can't be faster than O(n^2). Indeed to check that all element are maximal for the poset where no element are comparable, you need to compare every pairs of elements. So it's definitely quadratic in the worst case.
Suppose you have looked at all (n choose 2) comparisons except for one, between xi and xj, i != j. In some scenarios, the only two candidates for being maximal are exactly these two, xi and xj.
If you do not compare xi and xj, you cannot definitively say whether they are both maximal, or whether only one of them is.
Therefore, you must check all possible (n choose 2) (O(n2)) comparisons.
Note this assumes your partially ordered set is specified with a black box that will do a comparison. If the partially ordered set is given as a graph to start with, you can subsequently find the set of maximal elements in sub-O(n2) time.
As other answers have pointed out, the worst case complexity is O(n^2).
However, there are heuristics that can help a lot in practice. For example if the set A is a subset of Z^2 (integer pairs), then we can eliminate a lot of points upfront by:
This is of cost O(n). It is easy to see that any maximal point will be present in xy-maximals. However, it can contain non-maximal points. For example, consider the set {(1,0), (0,1), (2,2)}.
Depending on your situation, this may be a good enough heuristic. You can follow this up with the exhaustive algorithm on the smaller set xy-maximals.
More generally, this problem is called the 'Pareto Frontier' calculation problem. Here are good references:
http://www.cs.yorku.ca/~jarek/papers/vldbj06/lessII.pdf
https://en.wikipedia.org/wiki/Pareto_efficiency#Use_in_engineering_and_economics
In particular the BEST algorithm from the first reference is quite useful.