remove duplicate of a list in O(nlogn)

2019-06-06 17:04发布

问题:

How to remove duplicate of a list? (running time is O(n log n) ) ex: '(4 6 1 1 2 3 3 5 6) => '(4 6 1 2 3 5)

 (define (re-dup lst)
   (cond
      ((empty? lst) empty)
      (else
      (define el (first lst))
      (define el-free-lst (filter (lambda (x) (not (= el x))) (rest lst)))
      (cons el (re-dup el-free-lst)))))

Is this right?

回答1:

Your current solution is O(n^2), because filter traverses the list once for each of the elements in the original list. It's possible to write an O(n) solution using a helper data structure with constant time insertion and membership operations to keep track of the elements that have been found already.

In Racket, we have an out-of-the-box set data structure which, for constant time operations, "actually requires O(log N) time for a set of size N" (see the documentation), therefore the set-member? and set-add procedures will be O(log n). So this implementation using Racket's set is not optimal, but we achieve the O(n log n) target:

(define (re-dup lst)
  (let loop ((seen (set))
             (lst lst)
             (acc '()))
    (cond ((null? lst)
           (reverse acc))
          ((set-member? seen (car lst))
           (loop seen (cdr lst) acc))
          (else
           (loop (set-add seen (car lst))
                 (cdr lst)
                 (cons (car lst) acc))))))

It works as expected, preserving the original order in the list (which is a constraint for this problem, as stated in the comments) at the cost of one additional O(n) reverse operation:

(re-dup '(4 6 1 1 2 3 3 5 6))
=> '(4 6 1 2 3 5)


回答2:

You can get O(n log n) behavior with the following:

  1. Merge sort (which is O(n log n))
  2. Traverse and discard duplicates (which is O(n)

Thus overall O(n log n).

(define (re-dup lst)
  (if (null? lst)
      lst
      (let ((lst (list-sort < lst)))
        (let thinning ((skip (car lst)) (rest (cdr lst)))
          (cond ((null? rest) (list skip))
                ((equal? skip (car rest)) (thinning skip (cdr rest)))
                (else (cons skip (thinning (car rest) (cdr rest)))))))))


标签: scheme racket