A range intersection algorithm better than O(n)?

2020-01-30 04:14发布

Range intersection is a simple, but non-trivial problem.

Its has been answered twice already:

The first solutions is O(n) and the second solution is for a database (which is less than O(n) of course).

I have the same problem, but for a large n and I am not within a database.

This problem seems to be very similar to Store 2D points for quick retrieval of those inside a rectangle but I don't see how it maps.

So what data structure would you store the set of ranges in, such that a search on a range costs less than O(n)? (Extra credit for using libraries available for Java)

EDIT:

I want to get a subset of all intersecting ranges, meaning the search range could intersect multiple ranges.

The method that needs to be less than O(n) in Java is:

public class RangeSet {
    ....
    public Set<Range> intersects(Range range);
    ....
}

Where Range is just a class containing a pair of int start and end.

This is not an impossible question, I already have the solution, I just wanted to see if there was a more standard/simpler way of doing it

9条回答
地球回转人心会变
2楼-- · 2020-01-30 04:55

Overlapping Ranges:

Prep O(n log n):

  1. Make a array / vector of the ranges.
  2. Sort the vector by the end of the range (break ties by sorting by the start of the range)
  3. Make a second vector of ints. This represents the point at which you can stop searching.

    int stop[size];
    stop[size-1] = Ranges[size - 1].start;
    for (int i = size - 2; i >= 0; i--)
    {
        stop[i] = min(Ranges[i].start, stop[i+1]);
    }
    

Search:

  1. Use binary search to find the first range with an End value of >= TestRange.Start
  2. Iterator starting at the binary search until stop[i] > TestRange.End:

    2a. If the range if the current range is within the TestRange, add it to your result.

查看更多
男人必须洒脱
3楼-- · 2020-01-30 05:02

Just as a quad tree works for a set of 2d points, a simple binary tree should work for this case. Build a tree with your ranges.

To explain further: Each node in the tree contains two integers, the beginning and end of the range, and the two children if it's not a leaf node. To find the ranges that your input range spans, then starting at the top of the tree

  - if the node range intersects the input range:
     - if it's a leaf node, then add the range to your result list
     - if it's not a leaf node, then traverse down to the child nodes and repeat this process.

It should be O(logN)

Further detail: The binary tree would be structured like a 1-d version of a quad tree. Each node would have three integers (sorry I said two above, but now I realize you need three), the lowest representing the lowest value of the lowest range that's below this node, the highest representing the highest value of the highest range that's below this node, and the pivot. The left child would span from this node's lowest to its pivot. The right child would span from this node's pivot to this node's highest. If there is only one range that goes from "lowest" to "highest", you wouldn't have a pivot and this would be a leaf. Ideally you'd pick the pivots for each node to keep the tree balanced.

查看更多
太酷不给撩
4楼-- · 2020-01-30 05:07

Sounds like you need a class that implements the SortedSet interface. TreeSet is the implementation that ships with the core API.

Have one set holding the ranges sorted by lowest value, and one sorted by highest value.

You can then implement the equivalent of the database algorithm using the in-memory sets.

As for whether this is actually faster than O(n), I couldn't say.

查看更多
登录 后发表回答