Flattening a List of Lists

2019-01-19 23:54发布

问题:

I'm new to Scheme and functional programming in general. Can someone explain this code — specifically what kons and knil are? The goal is to flatten a list of lists.

(define (fold1 kons knil lst)  
  (if (null? lst)  
      knil  
      (fold1 kons (kons (car lst) knil) (cdr lst))))

I'm fairly certain kons is a function as it's being applied to two arguments but still not totally sure about its functionality.

回答1:

This is a (weird) fold

This is a generalized folding procedure. In Lisps, lists are represented by cons cells and the empty list, where each (proper) list is either the empty list (), or a cons cell whose car is an element of the list and whose cdr is the rest of the list. E.g., a list (1 2 3 4 5) can be produced by

(cons 1 (cons 2 (cons 3 (cons 4 (cons 5 '())))))

The fold1 function that you've shown:

(define (fold1 kons knil lst)
  (if (null? lst)
      knil
      (fold1 kons (kons (car lst) knil) (cdr lst))))

is a a way of taking a list like the one shown above and transforming it to:

(kons 5 (kons 4 (kons 3 (kons 2 (kons 1 knil)))))

This is a fold. This is an efficient generalization of lots of operations. For instance, if you use 0 as knil and + as kons, you compute the sum of the elements in the list.

Usually folds are right or left associative. A proper left-associative fold would transform to

(kons (kons (kons (kons (kons knil 1) 2) 3) 4) 5)

which might be clearer when viewed with + and infix notation:

(((((0 + 1) + 2) + 3) + 4) + 5)

The right associative fold would become

(1 + (2 + (3 + (4 + (5 + 0)))))

The left associative fold can be more efficient because the natural implementation is tail recursive, and elements are consumed from the list in the order that they can be extracted from the list. E.g., in the proper left associatve example, (kons knil 1) can be evaluated first to produce some value v, and then, in the same stack space, (kons v 2) can be evaluated, and so on. The right associative method requires traversing to the end of the list first. A naïve implementation requires stack space proportional to the length of the list.

This fold1 mixes things up a bit, because it's processing the elements of the list in a left associative manner, but the order of the arguments to the combining function is reversed.

This type of definition can be used any time that you have a algebraic datatype. Since a list in Lisps is either the empty list, or an element and a list combined with cons, you can write a function that handles each of these cases, and produces a new value by “replacing” cons with a combination function and the empty list with some designated value.

Flattening a list of lists

So, if you've got a list of lists, e.g., ((a b) (c d) (e f)), it's constructed by

(cons '(a b) (cons '(c d) (cons '(e f) '())))

With a right associative fold, you transform it to:

(append '(a b) (append '(c d) (append '(e f) '())))

by using append for kons, and '() for knil. However, in this slightly mixed up fold, your structure will be

(kons '(e f) (kons '(c d) (kons '(a b) knil)))

so knil can still be '(), but kons will need to be a function that calls append, but swaps the argument order:

(define (flatten lists)
  (fold1 (lambda (right left)
           (append left right))
         '()
         lists))

And so we have:

(flatten '((a b) (c d) (e f)))
;=> (a b c d e f)

Flattening deeper lists of lists

Given that this is a folding exercise, I expected that the list of lists are nested only one layer deep. However, since we've seen how to implement a simple flatten

(define (flatten lists)
  (fold1 (lambda (right left)
           (append left right))
         '()
         lists))

we can modify this to make sure that deeper lists are flattened, too. The kons function now

(lambda (right left)
  (append left right))

simply appends the two lists together. left is the already appended and flattened list that we've been building up. right is the new component that we're taking on now. If we make a call to flatten that, too, that should flatten arbitrarily nested lists:

(define (flatten lists)
  (fold1 (lambda (right left)
           (append left (flatten right)))  ; recursively flatten sublists
         '()
         lists))

This is almost right, except that now when we call (flatten '((a b) (c d))), we'll end up making a call to (flatten '(a b)), which will in turn make a call to (flatten 'a), but flatten is a wrapper for fold1, and fold1 expects its arguments to be lists. We need to decide what to do when flatten is called with a non-list. A simple approach is to have it return a list containing the non-list argument. That return value will mesh nicely with the append that's receiving the value.

(define (flatten lists)               ; lists is not necessarily a list of lists anymore, 
  (if (not (pair? lists))             ; perhaps a better name should be chosen
      (list lists)
      (fold1 (lambda (right left)
               (append left (flatten right)))
             '()
             lists)))

Now we have

(flatten '(a (b (c)) (((d)))))
;=> (a b c d)


回答2:

The procedure shown is an implementation of fold:

In functional programming, fold – also known variously as reduce, accumulate, aggregate, compress, or inject – refers to a family of higher-order functions that analyze a recursive data structure and recombine through use of a given combining operation the results of recursively processing its constituent parts, building up a return value

Take note:

  • The kons parameter is a two-argument function that's used for "combining" the current element of the list being processed with the accumulated value
  • The knil parameter is the accumulated output result

To see how this works, imagine for a moment that we have a function such as this:

(define (reverse knil lst)
  (if (null? lst)
      knil
      (reverse (cons (car lst) knil) (cdr lst))))

(reverse '() '(1 2 3 4))
=> '(4 3 2 1)

In the above knil is used to accumulate the result, and it starts in a value of '() because we're building a list as output. And kons is called cons, which builds lists. Let's see another example:

(define (add knil lst)
  (if (null? lst)
      knil
      (add (+ (car lst) knil) (cdr lst))))

(add 0 '(1 2 3 4))
=> 10

In the above knil is used to accumulate the result, and it starts in a value of 0 because we're building a number as output. And kons is called +, which adds numbers.

By now you must have realized that both examples share the same structure of a solution, both consume an input list and the only things that change is how we "combine" the values pulled from the list and the starting accumulated value. If we're smart, we can factor out the parts that change into a higher order procedure, that receives the changing parts as parameters - thus fold1 is born:

(define (fold1 kons knil lst)
  (if (null? lst)
      knil
      (fold1 kons (kons (car lst) knil) (cdr lst))))

And both of the above examples can be easily expressed in terms of fold1, just pass along the right parameters:

(define (reverse lst)
  (fold1 cons '() lst))

(define (add lst)
  (fold1 + 0 lst))

Now for the second part of the question: if you want to flatten a list with fold1 you can try this:

(define (helper x lst)
  (if (pair? x)
      (fold1 helper lst x)
      (cons x lst)))

(define (flatten lst)
  (reverse (helper lst '())))

(flatten '(1 2 (3) (4 (5)) 6))
=> '(1 2 3 4 5 6)


回答3:

Following code using 'named let' and 'for' loop can be used to flatten the list of elements which themselves may be lists:

(define (myflatten ll)
  (define ol '())
  (let loop ((ll ll))
    (for ((i ll))
      (if (list? i)
          (loop i)
          (set! ol (cons i ol)))))
  (reverse ol))


(myflatten '(a () (b e (c)) (((d))))) 

Output:

'(a b e c d)

However, it uses 'set!' which is generally not preferred.

The 'for' loop can also be replaced by 'named let' recursion:

(define (myflatten ll) 
  (define ol '())
  (let outer ((ll ll))
    (let inner ((il ll))
      (cond
        [(empty? il)]
        [(list? (first il))
         (outer (first il))
         (inner (rest il))]
        [else
         (set! ol (cons (first il) ol))
         (inner (rest il))])))
  (reverse ol))