Optimization of list's sublist

2020-07-19 03:40发布

问题:

the problem is to find total number of sub-lists from a given list that doesn't contain numbers greater than a specified upper bound number say right and sub lists max number should be greater than a lower bound say left .Suppose my list is: x=[2, 0, 11, 3, 0] and upper bound for sub-list elements is 10 and lower bound is 1 then my sub-lists can be [[2],[2,0],[3],[3,0]] as sub lists are always continuous .My script runs well and produces correct output but needs some optimization

def query(sliced,left,right):
    end_index=0
    count=0
    leng=len(sliced)
    for i in range(leng):
        stack=[]
        end_index=i

        while(end_index<leng and sliced[end_index]<=right):

            stack.append(sliced[end_index])
            if max(stack)>=left:
                count+=1
            end_index+=1

    print (count)

origin=[2,0,11,3,0]
left=1
right=10
query(origin,left,right)

output:4

for a list say x=[2,0,0,1,11,14,3,5] valid sub-lists can be [[2],[2,0],[2,0,0],[2,0,0,1],[0,0,1],[0,1],[1],[3],[5],[3,5]] total being 10

回答1:

Brute force

Generate every possible sub-list and check if the given criteria hold for each sub-list.

Worst case scenario: For every element e in the array, left < e < right. Time complexity: O(n^3)

Optimized brute force (OP's code)

For every index in the array, incrementally build a temporary list (not really needed though) which is valid.

Worst case scenario: For every element e in the array, left < e < right. Time complexity: O(n^2)

A more optimized solution

If the array has n elements, then the number of sub-lists in the array is 1 + 2 + 3 + ... + n = (n * (n + 1)) / 2 = O(n^2). We can use this formula strategically.

First, as @Tim mentioned, we can just consider the sum of the sub-lists that do not contain any numbers greater than right by partitioning the list about those numbers greater than right. This reduces the task to only considering sub-lists that have all elements less than or equal to right then summing the answers.

Next, break apart the reduced sub-list (yes, the sub-list of the sub-list) by partitioning the reduced sub-list about the numbers greater than or equal to left. For each of those sub-lists, compute the number of possible sub-lists of that sub-list of sub-lists (which is k * (k + 1) / 2 if the sub-list has length k). Once that is done for all the the sub-lists of sub-lists, add them together (store them in, say, w) then compute the number of possible sub-lists of that sub-list and subtract w.

Then aggregate your results by sum.

Worst case scenario: For every element e in the array, e < left.

Time Complexity: O(n)

I know this is very difficult to understand, so I have included working code:

def compute(sliced, lo, hi, left):
    num_invalid = 0
    start = 0
    search_for_start = True
    for end in range(lo, hi):
        if search_for_start and sliced[end] < left:
            start = end
            search_for_start = False
        elif not search_for_start and sliced[end] >= left:
            num_invalid += (end - start) * (end - start + 1) // 2
            search_for_start = True
    if not search_for_start:
        num_invalid += (hi - start) * (hi - start + 1) // 2
    return ((hi - lo) * (hi - lo + 1)) // 2 - num_invalid

def query(sliced, left, right):
    ans = 0
    start = 0
    search_for_start = True
    for end in range(len(sliced)):
        if search_for_start and sliced[end] <= right:
            start = end
            search_for_start = False
        elif not search_for_start and sliced[end] > right:
            ans += compute(sliced, start, end, left)
            search_for_start = True
    if not search_for_start:
        ans += compute(sliced, start, len(sliced), left)
    return ans


回答2:

Categorise the numbers as small, valid and large (S,V and L) and further index the valid numbers: V_1, V_2, V_3 etc. Let us start off by assuming there are no large numbers.

Consider the list A = [S,S,…,S,V_1, X,X,X,X,…X] .If V_1 has index n, there are n+1, subsets of the form [V_1], [S,V_1], [S,S,V_1] and so on. And for each of these n+1 subsets, we can append the len(A)-n-1 sequences: [X], [XX], [XXX] and so on. Giving a total of (n+1)(len(A)-n) subsets containing V_1. But we can partition the set of all subsets by those containing V_k but no V_n for n less than k. Hence we must then, simply perform the same calculation on the remaining XXX…X part of the list using V_2 and itterate. This would require something like this:

def query(sliced,left,right,total):
    index=0
    while index<len(sliced):
        if sliced[index]>=left:
            total+=(index+1)*(len(sliced)-index)
            return total+query(sliced[index+1:],left,right,0)
        else:
            index+=1
    return total

To incorporate the large numbers, we can just partition the whole set according to where the large numbers occur and add the total number of sequence for each. If we call our first function, sub_query, then we arrive at the following:

def sub_query(sliced,left,right,total):
    index=0
    while index<len(sliced):
        if sliced[index]>=left:
            total+=(index+1)*(len(sliced)-index)
            return total+sub_query(sliced[index+1:],left,right,0)
        else:
            index+=1
    return total

def query(sliced,left,right):
    index=0
    count=0
    while index<len(sliced):
        if sliced[index]>right:
            count+=sub_query(sliced[:index],left,right,0)
            sliced=sliced[index+1:]
            index=0
        else:
            index+=1
    count+=sub_query(sliced,left,right,0)
    print (count)

This seems to run through the list and check for max/min values fewer times. Note it doesn’t distinguish between sub-lists that are the same but from different positions in the original list (as would arise from a list such as [0,1,0,0,1,0]. But the code from the original post wouldn’t do that either, so I am guessing this is not a requirement.