A range intersection algorithm better than O(n)?

Range intersection is a simple, but non-trivial problem.

Its has been answered twice already:

The first solutions is O(n) and the second solution is for a database (which is less than O(n) of course).

I have the same problem, but for a large n and I am not within a database.

This problem seems to be very similar to Store 2D points for quick retrieval of those inside a rectangle but I don't see how it maps.

So what data structure would you store the set of ranges in, such that a search on a range costs less than O(n)? (Extra credit for using libraries available for Java)

EDIT:

I want to get a subset of all intersecting ranges, meaning the search range could intersect multiple ranges.

The method that needs to be less than O(n) in Java is:

public class RangeSet {
    ....
    public Set<Range> intersects(Range range);
    ....
}

Where Range is just a class containing a pair of int start and end.

This is not an impossible question, I already have the solution, I just wanted to see if there was a more standard/simpler way of doing it

标签： java algorithm search big-o interval-intersection

9条回答

Emotional °昔

2楼-- · 2020-01-30 04:46

The standard approach is to use an interval tree.

In computer science, an interval tree is a tree data structure to hold intervals. Specifically, it allows one to efficiently find all intervals that overlap with any given interval or point. It is often used for windowing queries, for instance, to find all roads on a computerized map inside a rectangular viewport, or to find all visible elements inside a three-dimensional scene. A similar data structure is the segment tree.

The trivial solution is to visit each interval and test whether it intersects the given point or interval, which requires O(n) time, where n is the number of intervals in the collection. Since a query may return all intervals, for example if the query is a large interval intersecting all intervals in the collection, this is asymptotically optimal; however, we can do better by considering output-sensitive algorithms, where the runtime is expressed in terms of m, the number of intervals produced by the query. Interval trees have a query time of O(log n + m) and an initial creation time of O(n log n), while limiting memory consumption to O(n). After creation, interval trees may be dynamic, allowing efficient insertion and deletion of an interval in O(log n). If the endpoints of intervals are within a small integer range (e.g., in the range [1,...,O(n)]), faster data structures exist[1] with preprocessing time O(n) and query time O(1+m) for reporting m intervals containing a given query point.

0人赞添加讨论(0) 举报

做个烂人

3楼-- · 2020-01-30 04:47

If ranges overlap, and one wants to retrieve all the ranges that overlap (or contain) a given target range, most of the solutions above don't appear to work.

As some have pointed out, if (worst-case) all the ranges happen to intersect the target range (for example, if the target range is {0..MAXINT} or similar) then of course it takes O(n) to return the n ranges.

But isn't the interesting and typical/average case, where only a very small % of the n total ranges do intersect the target range? Call the number that do intersect "m" -- in that case, you might conceivably be able to do as well as O(m). And if n=10^9 and m=10, that's a make-or-break difference.

Consider the simple case of a text document that has various regions marked up for their "type" -- perhaps you want to find all the marked-up units that contain or intersect a given contiguous range of text (for example, a paragraph). In HTML, XML, or similar those can only be ancestors of the text-node(s) containing at least some characters of the target range. In typical representations with parent pointers in each node, that's O(m) -- way better than O(n), especially because m is (for short or synchronous target ranges) merely the tree nesting depth, which tends to be even lower than ln(n) because big XML documents in practice get bushier not deeper.

The interesting case is harder: what if your "elements" do not form a tree as in XML, but can overlap as in MECS, CLIX, LMNL, and some other systems? You still want to find all the regions/"elements" that overlap your target, but they aren't so easily organized.

On the other hand, you should be able to do very well because marked-up ranges in many applications are most frequently small -- there are far more words, sentences, and paragraphs in a book, than there are chapters. So even though there may be a huge number of ranges that start before the target and a huge number that end after it, the intersection will be very small on average.

I think that's what the original questioner was getting at, and I'm afraid i didn't see an answer that addresses that problem. If it's not what the original question was about, then I'd like to pose it as a new question.

0人赞添加讨论(0) 举报

Animai°情兽

4楼-- · 2020-01-30 04:51

When I had this problem, I used a sorted array of ranges and a binary search to look for intersections. This is (I believe) O(log n) performance, with a little bit of overhead to deal with overlapping ranges.

The answer to your question is, I think, derivable from the code below, but stopping short of the insertion. I present the entire code to avoid confusion by the differing context - I needed to insert a range of Unicode codepoints into a list of codepoint ranges.

-- EDIT --

Adapting the code below to determine intersections of multiple ranges involves a trivial forward search from the insertion point until a range is found which no longer intersects.

-- END EDIT --

The Range class contains:

final int                               lower;                                  // lower end of range
final int                               upper;                                  // upper end of range

public int compareTo(Object obj) {
    if(obj==null) { return -1; }

    Range                           oth=(Range)obj;

    if(lower<oth.lower) { return -1; }
    if(lower>oth.lower) { return  1; }
    if(upper<oth.upper) { return -1; }
    if(upper>oth.upper) { return  1; }
    return 0;
    }

Range Insertion:

public Builder addRange(int fir, int las) {
    if(fir!=-1) { fir&=0x001FFFFF; }
    if(las!=-1) { las&=0x001FFFFF; }

    if(codepoints==null || codepoints.length==0) {
        codepoints=new Range[]{new Range(fir,las)};
        }
    else {
        int                         idx=Range.findChar(codepoints,fir);
        int                         ins=(idx<0 ? -(idx+1) : idx);

        if(idx<0) {
            if     (ins>0                 && fir==(codepoints[ins-1].upper+1)) { idx=(ins-1); }  // new range adjoins the following range (can't overlap or idx would be >=0)
            else if(ins<codepoints.length && las>=(codepoints[ins  ].lower-1)) { idx=ins;     }  // new range overlaps or adjoins the following range
            }

        if(idx<0) {
            codepoints=(Range[])Util.arrayInsert(codepoints,ins,new Range(fir,las));
            }
        else {
            boolean                 rmv=false;

            for(int xa=(idx+1); xa<codepoints.length && codepoints[xa].lower<=las; xa++) {
                if(las<codepoints[xa].upper) { las=codepoints[xa].upper; }
                codepoints[xa]=null;
                rmv=true;
                }
            if(codepoints[idx].lower>fir || codepoints[idx].upper<las) {
                codepoints[idx]=new Range((codepoints[idx].lower < fir ? codepoints[idx].lower : fir),(codepoints[idx].upper>las ? codepoints[idx].upper : las));
                }
            if(rmv) { codepoints=Range.removeNulls(codepoints); }
            }
        }
    return this;
    }

Binary Search:

static int findChar(Range[] arr, int val) {
    if(arr.length==1) {
        if     (val< arr[0].lower) { return -1; }                             // value too low
        else if(val<=arr[0].upper) { return  0; }                             // value found
        else                       { return -2; }                             // value too high
        }
    else {
        int                             lowidx=0;                               // low index
        int                             hghidx=(arr.length-1);                  // high index
        int                             mididx;                                 // middle index
        Range                           midval;                                 // middle value

        while(lowidx<=hghidx) {
            mididx=((lowidx+hghidx)>>>1);
            midval=arr[mididx];
            if     (val< midval.lower) { hghidx=(mididx-1); }                   // value too low
            else if(val<=midval.upper) { return mididx;     }                   // value found
            else                       { lowidx=(mididx+1); }                   // value too high
            }
        return -(lowidx+1);                                                     // value not found.
        }
    }

0人赞添加讨论(0) 举报

劫难

5楼-- · 2020-01-30 04:51

Edit: It sounds like this solution is more or less an Interval Tree. A more complete implementation of an Interval Tree can be found here.

class TreeNode
{
public:
    long pivot;
    List<Range> leaves;  //Any ranges that intersect the pivot
    TreeNode left;        //Tree nodes that fall to the left of the pivot
    TreeNode right;       //Tree nodes that fall to the right of the pivot
};

Prep O(n log n):

Create the list of ranges
Pick the pivot points (possibly by using a sorted list of the end dates.) ??
Build your tree.

Search:

Use binary search to find the first pivot that is >= TestRange.End
Traverse the tree until the pivot > TestRange.Start

2a. Add the leaves to your result.

Example:

Ranges:

0 - 2
1 - 2
2 - 3
1 - 4
2 - 4
0 - 5
4 - 5
2 - 6
3 - 7

Tree:

                             4
               --------------+------------------
               3             |                 7
               |            1-4                |
               |            2-4                |
               |            0-5                |
               |            4-5                |
      ---------+------                 --------+--------
      2        |    null              6        |       null
 -----+----   2-3                 ----+----   3-7
null  |  null                   null  |  null    
     0-2                             2-6
     1-2

0人赞添加讨论(0) 举报

兄弟一词,经得起流年.

6楼-- · 2020-01-30 04:52

This depends on your exact problem, in the linked question, the ranges where distinct, no common part, and the searched ranged could span multiple ranges. If your problem is the same, its is really easy: Take an array of the ranges, sort them by their lowest values (since they do not overlap, this would be also the same order as sorted by their upper values).

Now just make a binsearch for the your target lower value (or smaller if not exact) and one for the target upper value(or bigger if not exact). The resulting indexes are the the ranges which are coverd. You have to check wheter the ranges at the indexes itself are in- or excluded, but that are just 2 checks. Overall complexity O(log n).

0人赞添加讨论(0) 举报

太酷不给撩

7楼-- · 2020-01-30 04:54

Non Overlapping Ranges:

Prep O(n log n):

Make a array / vector of the ranges.
Sort the vector by the end of the range (break ties by sorting by the start of the range)

Search:

Use binary search to find the first range with an End value of >= TestRange.Start
Iterator starting at the binary search until you find an Start > TestRange.End:

2a. If the range if the current range is within the TestRange, add it to your result.

0人赞添加讨论(0) 举报

1 2 下一页

A range intersection algorithm better than O(n)?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间