I was asked this in an interview. Given a list of integers, How can we find the biggest interval that has all its members in the given list?
E.g. given list 1,3,5,7,4,6,10 then answer would be [3, 7]. Because it has all the elements between 3 and 7.
I tried to answer but I wasn't convincing. The approach I took was to first sort the list and then check it for the biggest interval. But I was asked to do so in O(n)
.
Here's a solution similar to Grigor's. Two main differences are that this solution stores the length of the sequential set instead of other indexes and that this eliminates the need for the last hash-set iteration.
Iterate over the array
Build a hashmap by looking for and updating adjacent set endpoints:
Key - The array values
Value - When the key is an endpoint of a sequential set, store the length of that set. Otherwise, keep it truthy so you only consider things once.
If the current set size is longest, update the longest set size and longest set start.
Here's a JavaScript implementation for clarity, as well as a fiddle to see it in action:
Disclaimer: Since the solution is based on hashtables, the running times are expected, not worst case.
This O(n) solution depends on the integers being unique. If they are not unique, make a hashset with O(1) insertion and membership lookup, and just skip the numbers already encountered, as you go through the list.
Make a O(1) lookup/insert hashmap where the values are the beginnings of ranges, and the keys are the numbers that fit at the end of those ranges. For a value v and a key k, this means that the range starting from v and ending with k-1 inclusive is located at key k.
Go through the list of numbers. For each number n check if the map has a value v at key n. This corresponds to there being a range starting from v that would allow n at the end. If there is, move v to key n+1 and delete the entry at key n. If there isn't any range, insert n at key n+1.
Since the numbers are unique, none of the ranges overlap in the end, but there may be some contiguous ones. Run through the key/value pairs of the map. For each key k and value v, if the map has a value v1 at key k1 = v, then it means that there is a range from v1 to k-1. Insert v1 at k, and delete the entry k1/v1.
Go through the k/v entries of the map to find the largest range [v,k-1] of size k-v, using a running maximum.
For your example:
That would be linear considering dictionaries built with average O(1) hash tables.
The trick is to think of the items as a set instead of a list. This allows you to identify items that are at the start or end of contiguous ranges, because a set lets you check if item-1 or item+1 is present. With that, you can solve the problem in linear time and space.
Pseudo-Code:
C# Code:
note: MaxBy is from MoreLinq
Testing
Small sanity check:
Big contiguous list:
Big fragmented list:
Complexity
This algorithm requires O(N) time and and O(N) space, where N is the number of items in the list, assuming the set operations are constant time.
Note that if the set was given as an input, instead of being built by the algorithm, we would only need O(1) space.
(Some comments say this is quadratic time. I think they assumed all items, instead of just items at the starts of ranges, triggered scans. That would indeed be quadratic, if the algorithm worked that way.)
1 idea: well, I think you have to kinda sort the list anyway, but you can't go with merge or quick sort. But if you have memory, you could use idea from counting sort for integers.
So you can create array of 0 and 1, from 0 to max int value, then fill it with ones if you have value and then find maximum continous array
2 idea: create dictionary of values, find min and max - all O(N) operations:
then, go like
i in range(min, max)
and and find longest continuous subsetbut this could be slow for sparse lists like
[1, 9000, 100000]
EDIT: based on super great answer of Grigor Gevorgyan, here's the code for O(N) dictionary solution in Python (I just love it's simplicity!!!)
output:
I know a solution based on hashing and dynamic programming. Let f(x) be the hash function. The trick is the hash-table value. Consider the longest interval contained in the list, which either starts or ends with x. Then h[f(x)] = y, where y is the other end of that interval. Note that length of that interval will be abs( x - y ) + 1. The algorithm description will make clear why to store that value.
Move over the list. Let i be current index, x := list[ i ] - current number. Now
1. if h[f(x)] is not empty, then we've met number x before. Nothing to do, continue.
2. Check h[f(x-1)] and h[f(x+1)].
2.1. If they're both not empty, that means we've already met x-1 and x+1, and we know some intervals [a..x-1] and [x+1..b] which we've already met in the list. We know it because a=h[f(x-1)] and b=h[f(x+1)] by definition of h. Now when we got x, it means that we now have met the whole interval [a,b], so we update values as follows: h[f(a)] := b and h[f(b)] := a.
Also set h[f(x)] to some value (let's say x, not to impact the answer), just so that next time we meet x in the list, we ignore it. x has already done his job.
2.2. If only one of them is set, let's say h[f(x-1)] = a, that means we've already met some interval [a..x-1], and now it's extended with x. Update will be h[f(a)] := x and h[f(x)] := a.
2.3. If none of them is set, that means we've met neither x-1, nor x+1, and the biggest interval containing x we've already met is the single [x] itself. So set h[f(x)] := x.
Finally, to get the answer, pass over the whole list and take maximum abs( x - h[f(x)] ) + 1 for all x.