Range intersection is a simple, but non-trivial problem.
Its has been answered twice already:
The first solutions is O(n) and the second solution is for a database (which is less than O(n) of course).
I have the same problem, but for a large n and I am not within a database.
This problem seems to be very similar to Store 2D points for quick retrieval of those inside a rectangle but I don't see how it maps.
So what data structure would you store the set of ranges in, such that a search on a range costs less than O(n)? (Extra credit for using libraries available for Java)
EDIT:
I want to get a subset of all intersecting ranges, meaning the search range could intersect multiple ranges.
The method that needs to be less than O(n) in Java is:
public class RangeSet {
....
public Set<Range> intersects(Range range);
....
}
Where Range is just a class containing a pair of int start and end.
This is not an impossible question, I already have the solution, I just wanted to see if there was a more standard/simpler way of doing it
The standard approach is to use an interval tree.
If ranges overlap, and one wants to retrieve all the ranges that overlap (or contain) a given target range, most of the solutions above don't appear to work.
As some have pointed out, if (worst-case) all the ranges happen to intersect the target range (for example, if the target range is {0..MAXINT} or similar) then of course it takes O(n) to return the n ranges.
But isn't the interesting and typical/average case, where only a very small % of the n total ranges do intersect the target range? Call the number that do intersect "m" -- in that case, you might conceivably be able to do as well as O(m). And if n=10^9 and m=10, that's a make-or-break difference.
Consider the simple case of a text document that has various regions marked up for their "type" -- perhaps you want to find all the marked-up units that contain or intersect a given contiguous range of text (for example, a paragraph). In HTML, XML, or similar those can only be ancestors of the text-node(s) containing at least some characters of the target range. In typical representations with parent pointers in each node, that's O(m) -- way better than O(n), especially because m is (for short or synchronous target ranges) merely the tree nesting depth, which tends to be even lower than ln(n) because big XML documents in practice get bushier not deeper.
The interesting case is harder: what if your "elements" do not form a tree as in XML, but can overlap as in MECS, CLIX, LMNL, and some other systems? You still want to find all the regions/"elements" that overlap your target, but they aren't so easily organized.
On the other hand, you should be able to do very well because marked-up ranges in many applications are most frequently small -- there are far more words, sentences, and paragraphs in a book, than there are chapters. So even though there may be a huge number of ranges that start before the target and a huge number that end after it, the intersection will be very small on average.
I think that's what the original questioner was getting at, and I'm afraid i didn't see an answer that addresses that problem. If it's not what the original question was about, then I'd like to pose it as a new question.
When I had this problem, I used a sorted array of ranges and a binary search to look for intersections. This is (I believe) O(log n) performance, with a little bit of overhead to deal with overlapping ranges.
The answer to your question is, I think, derivable from the code below, but stopping short of the insertion. I present the entire code to avoid confusion by the differing context - I needed to insert a range of Unicode codepoints into a list of codepoint ranges.
-- EDIT --
Adapting the code below to determine intersections of multiple ranges involves a trivial forward search from the insertion point until a range is found which no longer intersects.
-- END EDIT --
The Range class contains:
Range Insertion:
Binary Search:
Edit: It sounds like this solution is more or less an Interval Tree. A more complete implementation of an Interval Tree can be found here.
Prep O(n log n):
Search:
Traverse the tree until the pivot > TestRange.Start
2a. Add the leaves to your result.
Example:
Ranges:
Tree:
This depends on your exact problem, in the linked question, the ranges where distinct, no common part, and the searched ranged could span multiple ranges. If your problem is the same, its is really easy: Take an array of the ranges, sort them by their lowest values (since they do not overlap, this would be also the same order as sorted by their upper values).
Now just make a binsearch for the your target lower value (or smaller if not exact) and one for the target upper value(or bigger if not exact). The resulting indexes are the the ranges which are coverd. You have to check wheter the ranges at the indexes itself are in- or excluded, but that are just 2 checks. Overall complexity O(log n).
Non Overlapping Ranges:
Prep O(n log n):
Search:
Iterator starting at the binary search until you find an Start > TestRange.End:
2a. If the range if the current range is within the TestRange, add it to your result.