Find the biggest interval that has all its members

2020-05-11 09:41发布

I was asked this in an interview. Given a list of integers, How can we find the biggest interval that has all its members in the given list?

E.g. given list 1,3,5,7,4,6,10 then answer would be [3, 7]. Because it has all the elements between 3 and 7.

I tried to answer but I wasn't convincing. The approach I took was to first sort the list and then check it for the biggest interval. But I was asked to do so in O(n).

10条回答
唯我独甜
2楼-- · 2020-05-11 10:02

Here's a solution similar to Grigor's. Two main differences are that this solution stores the length of the sequential set instead of other indexes and that this eliminates the need for the last hash-set iteration.

  1. Iterate over the array

    • Build a hashmap by looking for and updating adjacent set endpoints:

      Key - The array values

      Value - When the key is an endpoint of a sequential set, store the length of that set. Otherwise, keep it truthy so you only consider things once.

    • If the current set size is longest, update the longest set size and longest set start.

Here's a JavaScript implementation for clarity, as well as a fiddle to see it in action:

var array = [1,3,5,7,4,6,10];

//Make a hash of the numbers - O(n) assuming O(1) insertion
var longestSetStart;
var longestSetSize = 0;

var objArray = {};
for(var i = 0; i < array.length; i++){
    var num = array[i];

    if(!objArray[num]){//Only consider numbers once
        objArray[num] = 1;//Initialize to 1 item in the set by default

        //Get the updated start and end of the current set
        var currentSetStart = num;//Starting index of the current set
        var currentSetEnd = num;//Ending index of the current set

        //Get the updated start of the set
        var leftSetSize = objArray[num - 1];
        if(leftSetSize){
            currentSetStart = num - leftSetSize;
        }

        //Get the updated end of the set
        var rightSetSize = objArray[num + 1];
        if(rightSetSize){
            currentSetEnd = num + rightSetSize;
        }

        //Update the endpoints
        var currentSetSize = currentSetEnd - currentSetStart + 1;
        objArray[currentSetStart] = currentSetSize;
        objArray[currentSetEnd] = currentSetSize;

        //Update if longest set
        if(currentSetSize > longestSetSize){
            longestSetSize = currentSetSize;
            longestSetStart = currentSetStart;
        }
    }
}

var longestSetEnd = longestSetStart + longestSetSize - 1;
查看更多
别忘想泡老子
3楼-- · 2020-05-11 10:07

Disclaimer: Since the solution is based on hashtables, the running times are expected, not worst case.

This O(n) solution depends on the integers being unique. If they are not unique, make a hashset with O(1) insertion and membership lookup, and just skip the numbers already encountered, as you go through the list.

  1. Make a O(1) lookup/insert hashmap where the values are the beginnings of ranges, and the keys are the numbers that fit at the end of those ranges. For a value v and a key k, this means that the range starting from v and ending with k-1 inclusive is located at key k.

  2. Go through the list of numbers. For each number n check if the map has a value v at key n. This corresponds to there being a range starting from v that would allow n at the end. If there is, move v to key n+1 and delete the entry at key n. If there isn't any range, insert n at key n+1.

  3. Since the numbers are unique, none of the ranges overlap in the end, but there may be some contiguous ones. Run through the key/value pairs of the map. For each key k and value v, if the map has a value v1 at key k1 = v, then it means that there is a range from v1 to k-1. Insert v1 at k, and delete the entry k1/v1.

  4. Go through the k/v entries of the map to find the largest range [v,k-1] of size k-v, using a running maximum.

For your example:

setup:
l = [1,3,5,7,4,6,10]
m = {}

iteration:
process 1  : m = {2->1}
process 3  : m = {2->1, 4->3}
process 5  : m = {2->1, 4->3, 6->5}
process 7  : m = {2->1, 4->3, 6->5, 8->7}
process 4  : m = {2->1, 5->3, 6->5, 8->7}
process 6  : m = {2->1, 5->3, 7->5, 8->7}
process 10 : m = {2->1, 5->3, 7->5, 8->7, 11->10}

concatenation of contiguous ranges:
initial:              m = {2->1, 5->3, 7->5, 8->7, 11->10}
first concatenation:  m = {2->1,       7->3, 8->7, 11->10}, k=7, v=5, k1=5, v1=3
second concatenation: m = {2->1,             8->3, 11->10}, k=8, v=7, k1=7, v1=3

result:
largest range : [3,7] of size 5
查看更多
仙女界的扛把子
4楼-- · 2020-05-11 10:08

That would be linear considering dictionaries built with average O(1) hash tables.

L = [1,3,5,7,4,6,10]

a_to_b = {}
b_to_a = {}

for i in L:
    if i+1 in a_to_b and i-1 in b_to_a:
        new_a = b_to_a[i-1]
        new_b = a_to_b[i+1]
        a_to_b[new_a] = new_b
        b_to_a[new_b] = new_a
        continue
    if i+1 in a_to_b:
        a_to_b[i] = a_to_b[i+1]
        b_to_a[a_to_b[i]] = i
    if i-1 in b_to_a:
        b_to_a[i] = b_to_a[i-1]
        a_to_b[b_to_a[i]] = i
    if not (i+1 in a_to_b or i-1 in b_to_a):
        a_to_b[i] = i
        b_to_a[i] = i

max_a_b = max_a = max_b = 0
for a,b in a_to_b.iteritems():
    if b-a > max_a_b:
        max_a = a
        max_b = b
        max_a_b = b-a

print max_a, max_b  
查看更多
爷的心禁止访问
5楼-- · 2020-05-11 10:12

The trick is to think of the items as a set instead of a list. This allows you to identify items that are at the start or end of contiguous ranges, because a set lets you check if item-1 or item+1 is present. With that, you can solve the problem in linear time and space.

Pseudo-Code:

  • Enumerate the items in the set, looking for ones that are at the start of a range (x starts a range when x-1 is not in the set).
  • For each value that is the start of a range, scan upwards until you find the corresponding end of range value (x ends a range when x+1 is not in the set). This gives you all the relevant contiguous ranges.
  • Return the contiguous range whose end was furthest from its start.

C# Code:

static Tuple<int, int> FindLargestContiguousRange(this IEnumerable<int> items) {
    var itemSet = new HashSet<int>(items);

    // find contiguous ranges by identifying their starts and scanning for ends
    var ranges = from item in itemSet

                 // is the item at the start of a contiguous range?
                 where !itemSet.Contains(item-1)

                 // find the end by scanning upward as long as we stay in the set
                 let end = Enumerable.Range(item, itemSet.Count)
                           .TakeWhile(itemSet.Contains)
                           .Last()

                 // represent the contiguous range as a tuple
                 select Tuple.Create(item, end);

     // return the widest contiguous range that was found
     return ranges.MaxBy(e => e.Item2 - e.Item1);
}

note: MaxBy is from MoreLinq

Testing

Small sanity check:

new[] {3,6,4,1,8,5}.FindLargestContiguousRange().Dump();
// prints (3, 6)

Big contiguous list:

var zeroToTenMillion = Enumerable.Range(0, (int)Math.Pow(10, 7)+1);
zeroToTenMillion.FindLargestContiguousRange().Dump();
// prints (0, 10000000) after ~1 seconds

Big fragmented list:

var tenMillionEvens = Enumerable.Range(0, (int)Math.Pow(10, 7)).Select(e => e*2);
var evensWithAFewOdds = tenMillionEvens.Concat(new[] {501, 503, 505});
evensWithAFewOdds.FindLargestContiguousRange().Dump();
// prints (500, 506) after ~3 seconds

Complexity

This algorithm requires O(N) time and and O(N) space, where N is the number of items in the list, assuming the set operations are constant time.

Note that if the set was given as an input, instead of being built by the algorithm, we would only need O(1) space.

(Some comments say this is quadratic time. I think they assumed all items, instead of just items at the starts of ranges, triggered scans. That would indeed be quadratic, if the algorithm worked that way.)

查看更多
叛逆
6楼-- · 2020-05-11 10:14

1 idea: well, I think you have to kinda sort the list anyway, but you can't go with merge or quick sort. But if you have memory, you could use idea from counting sort for integers.

So you can create array of 0 and 1, from 0 to max int value, then fill it with ones if you have value and then find maximum continous array

2 idea: create dictionary of values, find min and max - all O(N) operations:

dict = {1: 1, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 10: 10}
min = 1
max = 10

then, go like i in range(min, max) and and find longest continuous subset

>>> d = [1, 3, 5, 7, 4, 6, 10]
>>> s = set(d)
>>> mind = min(d)
>>> maxd = max(d)
>>> a, b, j = 0, 0, 0

>>> for i in range(mind, maxd):
        if i not in s:
            if (b - a) < (i - j - 1):
                a, b = j, i - 1
            j = i + 1

>>> a, b
(3, 7)

but this could be slow for sparse lists like [1, 9000, 100000]

EDIT: based on super great answer of Grigor Gevorgyan, here's the code for O(N) dictionary solution in Python (I just love it's simplicity!!!)

l = [1, 3, 5, 7, 4, 6, 10]
d = {x:None for x in l}
print d
for (k, v) in d.iteritems():
    if v is not None: continue
    a, b = d.get(k - 1), d.get(k + 1)
    if a is not None and b is not None: d[k], d[a], d[b] = k, b, a
    elif a is not None: d[a], d[k] = k, a
    elif b is not None: d[b], d[k] = k, b
    else: d[k] = k
    print d

m = max(d, key=lambda x: d[x] - x)
print m, d[m]

output:

{1: None, 3: None, 4: None, 5: None, 6: None, 7: None, 10: None}
{1: 1, 3: None, 4: None, 5: None, 6: None, 7: None, 10: None}
{1: 1, 3: 3, 4: None, 5: None, 6: None, 7: None, 10: None}
{1: 1, 3: 4, 4: 3, 5: None, 6: None, 7: None, 10: None}
{1: 1, 3: 5, 4: 3, 5: 3, 6: None, 7: None, 10: None}
{1: 1, 3: 6, 4: 3, 5: 3, 6: 3, 7: None, 10: None}
{1: 1, 3: 7, 4: 3, 5: 3, 6: 3, 7: 3, 10: None}
{1: 1, 3: 7, 4: 3, 5: 3, 6: 3, 7: 3, 10: 10}
3 7
查看更多
We Are One
7楼-- · 2020-05-11 10:20

I know a solution based on hashing and dynamic programming. Let f(x) be the hash function. The trick is the hash-table value. Consider the longest interval contained in the list, which either starts or ends with x. Then h[f(x)] = y, where y is the other end of that interval. Note that length of that interval will be abs( x - y ) + 1. The algorithm description will make clear why to store that value.

Move over the list. Let i be current index, x := list[ i ] - current number. Now

1. if h[f(x)] is not empty, then we've met number x before. Nothing to do, continue.

2. Check h[f(x-1)] and h[f(x+1)].

2.1. If they're both not empty, that means we've already met x-1 and x+1, and we know some intervals [a..x-1] and [x+1..b] which we've already met in the list. We know it because a=h[f(x-1)] and b=h[f(x+1)] by definition of h. Now when we got x, it means that we now have met the whole interval [a,b], so we update values as follows: h[f(a)] := b and h[f(b)] := a.
Also set h[f(x)] to some value (let's say x, not to impact the answer), just so that next time we meet x in the list, we ignore it. x has already done his job.

2.2. If only one of them is set, let's say h[f(x-1)] = a, that means we've already met some interval [a..x-1], and now it's extended with x. Update will be h[f(a)] := x and h[f(x)] := a.

2.3. If none of them is set, that means we've met neither x-1, nor x+1, and the biggest interval containing x we've already met is the single [x] itself. So set h[f(x)] := x.

Finally, to get the answer, pass over the whole list and take maximum abs( x - h[f(x)] ) + 1 for all x.

查看更多
登录 后发表回答