可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

My problem is as follows:

having file with list of intervals:

And a range of

0 200

I would like to do such an intersection that will report the positions [start end] between my intervals inside the given range.

For example:

8 9
12 20
30 200

Beside any ideas how to bite this, would be also nice to read some thoughts on optimization, since as always the input files are going to be huge.

回答1:

this solution works as long the intervals are ordered by the start point and does not require to create a list as big as the total range.

code

with open("0.txt") as f:
    t=[x.rstrip("\n").split("\t") for x in f.readlines()]
    intervals=[(int(x[0]),int(x[1])) for x in t]

def find_ints(intervals, mn, mx):
    next_start = mn
    for x in intervals:
        if next_start < x[0]:
            yield next_start,x[0]
            next_start = x[1]
        elif next_start < x[1]:
            next_start = x[1]
    if next_start < mx:
        yield next_start, mx

print list(find_ints(intervals, 0, 200))

output:

(in the case of the example you gave)

[(0, 1), (8, 9), (12, 20), (30, 200)]

回答2:

Rough algorithm:

create an array of booleans, all set to false seen = [False]*200
Iterate over the input file, for each line start end set seen[start] .. seen[end] to be True
Once done, then you can trivially walk the array to find the unused intervals.

In terms of optimisations, if the list of input ranges is sorted on start number, then you can track the highest seen number and use that to filter ranges as they are processed - e.g. something like

for (start,end) in input:
  if end<=lowest_unseen:
    next
  if start<lowest_unseen:
    start=lowest_unseen
  ...

which (ignoring the cost of the original sort) should make the whole thing O(n) - you go through the array once to tag seen/unseen and once to output unseens.

Seems I'm feeling nice. Here is the (unoptimised) code, assuming your input file is called input

seen = [False]*200
file = open('input','r')
rows = file.readlines()
for row in rows:
  (start,end) = row.split(' ')
  print "%s %s" % (start,end)
  for x in range( int(start)-1, int(end)-1 ):
    seen[x] = True

print seen[0:10]

in_unseen_block=False
start=1
for x in range(1,200):
  val=seen[x-1]
  if val and not in_unseen_block:
    continue
  if not val and in_unseen_block:
    continue
  # Must be at a change point.
  if val:
    # we have reached the end of the block
    print "%s %s" % (start,x)
    in_unseen_block = False
  else:
    # start of new block
    start = x
    in_unseen_block = True
# Handle end block
if in_unseen_block:
  print "%s %s" % (start, 200)

I'm leaving the optimizations as an exercise for the reader.

回答3:

If you make a note every time that one of your input intervals either opens or closes, you can do what you want by putting together the keys of opens and closes, sort into an ordered set, and you'll be able to essentially think, "okay, let's say that each adjacent pair of numbers forms an interval. Then I can focus all of my logic on these intervals as discrete chunks."

myRange = range(201)
intervals = [(1,5), (2,8), (9,12), (20,30)]
opens = {}
closes = {}

def open(index):
    if index not in opens:
        opens[index] = 0
    opens[index] += 1

def close(index):
    if index not in closes:
        closes[index] = 0
    closes[index] += 1

for start, end in intervals:
    if end > start: # Making sure to exclude empty intervals, which can be problematic later
        open(start)
        close(end)

# Sort all the interval-endpoints that we really need to look at
oset = {0:None, 200:None}
for k in opens.keys():
    oset[k] = None
for k in closes.keys():
    oset[k] = None
relevant_indices = sorted(oset.keys())

# Find the clear ranges
state = 0
results = []
for i in range(len(relevant_indices) - 1):
    start = relevant_indices[i]
    end = relevant_indices[i+1]

    start_state = state
    if start in opens:
        start_state += opens[start]
    if start in closes:
        start_state -= closes[start]

    end_state = start_state
    if end in opens:
        end_state += opens[end]
    if end in closes:
        end_state -= closes[end]
    state = end_state

    if start_state == 0:
        result_start = start
        result_end = end
        results.append((result_start, result_end))

for start, end in results:
    print(str(start) + " " + str(end))

This outputs:

The intervals don't need to be sorted.

回答4:

This question seems to be a duplicate of Merging intervals in Python.

If I understood well the problem, you have a list of intervals (1 5; 2 8; 9 12; 20 30) and a range (0 200), and you want to get the positions outside your intervals, but inside given range. Right?

There's a Python library that can help you on that: python-intervals (also available from PyPI using pip). Disclaimer: I'm the maintainer of that library.

Assuming you import this library as follows:

import intervals as I

It's quite easy to get your answer. Basically, you first want to create a disjunction of intervals based on the ones you provide:

inters = I.closed(1, 5) | I.closed(2, 8) | I.closed(9, 12) | I.closed(20, 30)

Then you compute the complement of these intervals, to get everything that is "outside":

compl = ~inters

Then you create the union with [0, 200], as you want to restrict the points to that interval:

print(compl & I.closed(0, 200))

This results in:

[0,1) | (8,9) | (12,20) | (30,200]

Python interval interesction

问题:

回答1:

code

output:

回答2:

回答3:

回答4:

收藏的人(0)

Python interval interesction

问题:

回答1:

code

output:

回答2:

回答3:

回答4:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮