Union of set of ranges in Scheme

2019-07-15 11:54发布

问题:

I need to write a Scheme function (union s1 s2) that when given two sets, let's say

s1 = ((1 3) (5 13) (25 100))
s2 = ((2 4) (17 26) (97 100))

will give

(union s1 s2) ----> ((1 4) (5 13) (17 100))

Also if

s1 = ((1 3) (5 13) (25 110) (199 300))
s2 = ((2 4) (17 26) (97 100) (110 200) (288 500))

then

(union s1 s2) ----> ((1 4) (5 13) (17 500))

Can anybody suggest me how to do this? I have no idea how to start.

回答1:

Instead of working with 2 sets, I would suggest

  1. merging the 2 sets into one combined list (ordered on the car)
  2. making a second pass to actually merge the elements, maintaining an accumulator list (result, in reversed order).

That way, you always compare the first and second element of the input (merged) list, and you know that they are ordered, which greatly simplifies your code:

  • empty merged list => return result
  • one element left => push this element onto the result, and return result, reversed back
  • at least two elements in list
    • if the 2 elements don't overlap, push first element onto the result, and loop on the rest of the merged list,
    • otherwise replace the two elements with the merged element, then loop

Example execution (your 2nd):

(union '((1 3) (5 13) (25 110) (199 300)) 
       '((2 4) (17 26) (97 100) (110 200) (288 500)))

lst: ((1 3) (2 4) (5 13) (17 26) (25 110) (97 100) (110 200) (199 300) 
      (288 500))
res: ()

lst: ((1 4) (5 13) (17 26) (25 110) (97 100) (110 200) (199 300) (288 500))
res: ()

lst: ((5 13) (17 26) (25 110) (97 100) (110 200) (199 300) (288 500))
res: ((1 4))

lst: ((17 26) (25 110) (97 100) (110 200) (199 300) (288 500))
res: ((5 13) (1 4))

lst: ((17 110) (97 100) (110 200) (199 300) (288 500))
res: ((5 13) (1 4))

lst: ((17 110) (110 200) (199 300) (288 500))
res: ((5 13) (1 4))

lst: ((17 200) (199 300) (288 500))
res: ((5 13) (1 4))

lst: ((17 300) (288 500))
res: ((5 13) (1 4))

lst: ((17 500))
res: ((5 13) (1 4))

lst: ()
res: ((17 500) (5 13) (1 4))

final result: ((1 4) (5 13) (17 500))

I wrote the code, it's 11 lines long and quite simple if you follow this approach.

EDIT

Since you ask for the code, here's the initial version I wrote:

(define (union set1 set2)
  (let loop ([lst (sort (append set1 set2) < #:key car)] [res null])
    (if (null? lst)
        (reverse res)
        (let ([e1 (car lst)] [cd (cdr lst)])
          (if (null? cd)
              (loop null (cons e1 res))
              (let ([e2 (car cd)] [e1y (cadr e1)])
                (if (> (car e2) e1y) 
                    (loop cd (cons e1 res))
                    (loop (cons (list (car e1) (max e1y (cadr e2))) 
                                (cdr cd)) 
                          res))))))))

Optionally, if you want/need to avoid append and sort, you could have your own merge procedure like so:

(define (merge lst1 lst2 cmp key)
  (let loop ((lst1 lst1) (lst2 lst2) (res null))
    (cond 
      ((and (null? lst1) (null? lst2))
       (reverse res))
      ((and (not (null? lst1)) 
            (or (null? lst2) 
                (cmp (key lst1) (key lst2))))
       (loop (cdr lst1) lst2 (cons (car lst1) res)))
      (else                            
       (loop lst1 (cdr lst2) (cons (car lst2) res))))))

and replace the second line of the union procedure with

  (let loop ([lst (merge set1 set2 < caar)] [res null])

Hope this helps.



回答2:

Sounds like a fun problem! It also sounds like a homework problem, so I'm going to try to point you to the answer rather than writing it for you myself. My apologies in advance if I'm misunderstanding the situation.

First, I'm assuming that the individual range sets are ordered, and non-overlapping.

This problem is one that fits well into the recursive mold, with a twist. Specifically, it's an example of "iterating over two complex pieces of data", section 17 of How To Design programs:

http://www.htdp.org/2003-09-26/Book/curriculum-Z-H-22.html#node_chap_17

Even more specifically, you're in case 3, where you actually need to consider all possible combinations.

In fact, it's even a bit worse than that, because in the case where both sets are nonempty, you care about which interval starts lower.

To get started, you're going to want to

  • define the classes of input data, and
  • develop a good set of examples. Make sure to include examples where both sets are empty, where just the first is empty, where just the second is empty, and where neither is empty.

Follow the HtDP template, and you should be all right. This is a tricky problem, though.



回答3:

This is a very simple problem. Each your range specification consists of list of intervals, ordered on their car values (and moreover, any element's cadr is smaller than the following element's car).

You just start with an empty ranges specification, a list of one empty range: ( () ).

Then, on each step you take one head element from one of the lists that has the smallest car value (of the two, since the lists are ordered, remember?) , and add it into your ranges specification. When both lists are exhausted (empty), you're done.

More specifically, it will be beneficial to maintain the resulting ranges specification in reversed order, separately holding a most recent range. If the incoming range (from one of the two heads) merges with it, you update the last range. If not, you push the last range into the reversed list, and hold the incoming head as the new last range.

Finishing up is equally trivial. It will also be easier to handle the inputs if you maintain the two lists in one list (list s1 s2) and write a special function get-input that takes this list and produces its result as a list, next-range and next two-lists (or () to signal that the both lists are exhausted).

Just so you know, it is an instance of a general pattern of iteration (usually coded in Scheme with named-let or otherwise tail-recursive functions, but there's also an explicit do construct), processing an input list's elements one by one, combining them with an accumulator. Keeping previous parts of accumulator reversed is a pattern too, viz. a zipper.


An example (your 2nd):

s1 = ((1 3) (5 13) (25 110) (199 300))
s2 = ((2 4) (17 26) (97 100) (110 200) (288 500)) 

list-of-ranges   last-range          two-input-lists
  ()               ()          ( ((1 3) ...)       ((2 4) ...)     )
  ()              (1 3)        ( ((5 13) ...)      ((2 4) ...)     )
  ()              (1 4)        ( ((5 13) ...)      ((17 26) ...)   )
 ((1 4))          (5 13)       ( ((25 110) ...)    ((17 26) ...)   )
((5 13)(1 4))     (17 26)      ( ((25 110) ...)    ((97 100) ...)  )
((5 13)(1 4))     (17 110)     ( ((199 300))       ((97 100) ...)  )
((5 13)(1 4))     (17 110)     ( ((199 300))       ((110 200) ...) )
((5 13)(1 4))     (17 200)     ( ((199 300))       ((288 500))     )
((5 13)(1 4))     (17 300)     ( ()                ((288 500))     )
((5 13)(1 4))     (17 500)     ( ()                ()              )
STOP


回答4:

Assuming r5rs scheme with no srfi, I would suggest making a function with an inner recursive accumulator function that is initially given s1, s2, intvl-acc='(), set-acc='(), last two ones holding temporary interval union and list of accumulated unions.

It should follow the following conditions: (I'll write in pseudocode, dividing into three cases.)

[base case]

(and (null? s1) (null? s2))
  (if (null? intvl-acc)
      reverse(set-acc)
      (union-acc '() '() '() (cons intvl-acc set-acc)))

[accumulating case]

(not (null? intvl-acc))
  (compare (cadr intvl-acc) with caar of each set)
  (if s1 is smaller, (union-acc (cdr s1) s2 updated-intvl-acc set-acc))
  (if not found, (union-acc s1 s2 '() (cons intvl-acc set-acc)))

[otherwise:]

else
  ((compare the caar of s1 s2)
   (if s1 is smaller, (union-acc (cdr s1) s2 (car s1) set-acc)

Hope it helped.



标签: scheme