Python complete search in one pass function

2019-08-19 06:16发布

问题:

I am writing a program that takes in a list of start and end times for farmers milking cows and determines both the longest time where >=1 cow is being milked and the longest time where no cows are being milk.

In it, I've tried using this function. It's an exercise on complete search, but this isn't fast enough when there's a lot of data (I think because there are n^2 iterations).

timesIS is simply a list of the times in increasing start order, and timesDE is a list of the same times by decreasing end. timeIndex is the position from which to start. For the longest interval of milking, my program later does this for every index and returns the longest interval.

While still keeping to a complete search, how can I make this more efficient (switch to something closer to n passes, perhaps)?

def nextCease(TimesIS, timesDE, timeIndex):
    latestTime = TimesIS[timeIndex][1]
    for j in range (0, len(timesDE)):
        for i in range (0, len(timesDE)):
            if timesDE[i][0]<=latestTime and timesDE[i][1]>=latestTime:
                latestTime = timesDE[i][1]
        if latestTime == timesDE[0][1]:
            return latestTime
            break
    return latestTime

Here's a small piece of data input (first line is just the number of farmers):

6
100 200
200 400
400 800
800 1600
50 100
1700 3200

I think this is a minimal, complete, and verifiable example:

from operator import itemgetter
times = [[100,200], [200,400], [400,800], [800,1600], [50,100], [1700,3200]

def nextCease(TimesIS, timesDE, timeIndex):
    latestTime = TimesIS[timeIndex][1]
    for j in range (0, len(timesDE)):
        for i in range (0, len(timesDE)):
            if timesDE[i][0]<=latestTime and timesDE[i][1]>=latestTime:
                latestTime = timesDE[i][1]
        if latestTime == timesDE[0][1]:
            return latestTime
            break
    return latestTime

timesIS = sorted(times[:], key=itemgetter(0)) #increasing starttimes
timesDE = sorted(times[:], key=itemgetter(1), reverse=True) #decreasing endtimes

longestIntervalMilk = 0
for i in range (0, len(times)):
    interval = nextCease(timesIS, timesDE, i) - timesIS[i][0]
    if interval > longestIntervalMilk:
        longestIntervalMilk = interval

longestIntervalNoMilk = 0
latestFinish = 0
for i in range (0, len(times)):
    latestFinish = nextCease(timesIS, timesDE, i)
    timesIS2 = timesIS[:]
    while(timesIS2[0][0] < latestFinish):
        nextStartExists = True
        del timesIS2[0]
        if timesIS2 == []:
            nextStartExists = False
            break
    if nextStartExists == True:
        nextStart = timesIS2[0][0]
        longestIntervalNoMilk = nextStart - latestFinish

print(str(longestIntervalMilk) + " " + str(longestIntervalNoMilk) + "\n"))

EDIT: In the meantime, I wrote up this. It gives the wrong output for a very long list (it's 1001 lines so I won't reprint it here, but you can find it at http://train.usaco.org/usacodatashow?a=iA4oZAAX7KZ) and I'm confused as to why:

times = sorted(times[:], key=itemgetter(0))

def longestMilkInterval(times):
    earliestTime = times[0]
    latestTime = times[0][1]
    interval = 0
    for i in range (1, len(times)):
        if times[i][1] > latestTime and times[i][0] <= latestTime:
            if times[i][1] - earliestTime[0] > interval:
                interval = times[i][1] - earliestTime[0]
                latestTime = times[i][1]
        else:
            earliestTime = times[i]
            latestTime = times[i][1]
            print(earliestTime)
    return interval

def longestNoMilkInterval(times):
    earliestTime = times[0][1]
    interval = 0
    for i in range (0, len(times)):
        if times[i][0] >= earliestTime:
            if times[i][0] - earliestTime > interval:
                interval = times[i][0] - earliestTime
                break
        else:
            earliestTime = times[i][1]
    return interval

Output should be 912 184 (>=1 cow, 0 cow).

回答1:

Here is a very straightforward approach which does it in one pass, including a sort, so the complexity would be O(n*logn).

# Part 1: transform to tuples (start_time, typ)
items = []
for start, end in times:
    items += [(start, 's'), (end, 'e')]
items = sorted(items)

# Part 2: compute max durations where 0 or 1+ cows are being milked
max_0_cows = max_1plus_cows = 0

last_intersection_time = items[0][0] # starting with first cow milk time
nof_cows_milked = 1

for i, (start_time, typ) in enumerate(items[1:], 1):
    if items[i-1][0] == start_time and items[i-1][1] != typ:
        continue
    if i+1 < len(items) and items[i+1][0] == start_time and items[i+1][1] != typ:
        continue

    if typ == 's':
        nof_cows_milked += 1
    elif typ == 'e':
        nof_cows_milked -= 1

    # check if we cross from 1+ -> 0 or 0 -> 1+
    if (typ, nof_cows_milked) in (('e', 0), ('s', 1)):
        duration = start_time - last_intersection_time
        if nof_cows_milked == 1:
            max_0_cows = max(max_0_cows, duration)
        if nof_cows_milked == 0:
            max_1plus_cows = max(max_1plus_cows, duration)
        last_intersection_time = start_time

print("Max time 0 cows: {}, Max time 1+ cows: {}".format(max_0_cows, max_1plus_cows))
  1. Building of items: It puts the start/end itervals into a list of tuples (start_time, typ) so we can traverse the list and if we see a s a new cow is being milked and e then a cow is stopped being milked. This way we can have a counter nof_cows_milked at any time which is the basis for getting the "max time 0 cows milked" and "max time 1+ cows milked"
  2. The actual longest-time-finder checks for all the transitions from 0 -> 1+ cows milked or 1+ cows -> 0 cows milked. In the first 4 lines it filters out the cases when two adjacent itervals (one farmer stops when the other farmer starts) It keeps track of those times with last_intersection_time and compares the duration to the max duration of max_0_cows and max_1_plus_cows. Again, this part is not very pretty, maybe there are more elegant ways to solve that.

[my algorithm] gives the wrong output [...] and I'm confused as to why

Your algorithm basically just checks for the longest interval of a single tuple, but doesn't check for overlapping or adjacent tuples.

Take these intervals as an visual example:

Your code just finds the interval G-H, whereas you need to find C-F. You somewhere need to keep track of how many cows are milked in parallel, so you need at least the nof_of_cows_milked counter as in my code example.