How to split list into evenly sized chunks in Rack

2020-07-13 10:26发布

问题:

Example:
How to convert list:
'(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)

Into list of lists:
'((0 1 2 3) (4 5 6 7) (8 9 10 11) (12 13 14 15))

Based on answers provided here so far, this is what I've come up with:

First define function to take up to 'n' elements from beginning of the list:

(define (take-up-to n xs)
  (define (iter xs n taken)
    (cond
      [(or (zero? n) (empty? xs)) (reverse taken)]
      [else (iter (cdr xs) (- n 1) (cons (car xs) taken))]))
  (iter xs n '()))

Second is similar function for the rest of list:

(define (drop-up-to n xs)
  (define (iter xs n taken)
    (cond
      [(or (zero? n) (empty? xs)) xs]
      [else (iter (cdr xs) (- n 1) (cons (car xs) taken))]))
  (iter xs n '()))

This could have been done as one function that returns two values and Racket has a function 'split-at' that produces same result, but I did this as an exercise.

ps. Is this correct use of tail recursion ?

Than split-into-chunks can be written like this:

(define (split-into-chunks n xs)
  (if (null? xs)
      '()
      (let ((first-chunk (take-up-to n xs))
            (rest-of-list (drop-up-to n xs)))
        (cons first-chunk (split-into-chunks n rest-of-list)))))

pps. Can this one be improved even more or is it 'good enough' ?

回答1:

There's a common utility function in Scheme, in the SRFI-1 library (which Racket offers, but I don't recall how to import it), called take, which takes the initial n elements from a list:

(take 4 '(0 1 2 3 4 5 6 7 8))
=> '(0 1 2 3)

There is also in the same library a function called drop which removes the initial n elements from a list:

(drop 4 '(0 1 2 3 4 5 6 7 8))
=> '(4 5 6 7 8)

You can break down the problem into smaller pieces by using functions like these. So, a first (but incorrect) approximation to solving your problem would be this:

(define (split-into-chunks n xs)
  (if (null? xs)
      '()
      (let ((first-chunk (take n xs))
            (rest (drop n xs)))
        (cons first-chunk (split-into-chunks n rest)))))

As I noted, however, this solution is suboptimal. Why? Because (take n xs) gives you an error when the list xs has fewer than n elements; translated to your problem, if the list has a non-n multiple of elements you get an error. However, you can fix this by writing a pair of functions, take-up-to and drop-up-to that work like take and drop but don't have that problem. So example usage of the functions would look like this:

(take-up-to 4 '(0 1 2))
=> '(0 1 2)

(drop-up-to 4 '(0 1 2))
=> '()

This is as much as I'm going to tell you. I suggest you do these things:

  • Write your own implementations of take, drop, take-up-to and drop-up-to, and use them to write the function you're trying to implement.
  • Skim through the documentation for the SRFI-1 library and familiarize yourself with what the functions there do. A lot of these list problems break down into easy combinations of these functions, so you want to know about them.
  • Learn how to import this library into Racket. (Can't help you there.)
  • As an exercise, try your hand at writing your own implementations of some of the SRFI-1 functions. (It's OK to simplify them a bit for the sake of the exercise; for example, while many of these functions will deal with multiple list arguments, it's OK for exercise purposes to write a version that deals with just one list.)

EDIT: Here's simple implementation of take-up-to:

(define (take-up-to n xs)
  (if (or (zero? n) (null? xs))
      '()
      (cons (car xs) (take-up-to (- n 1) (cdr xs)))))

It's possible to improve on this still some more to use only tail calls (and thus run in constant space). That's left as yet another exercise.



回答2:

as for me it's something like

(define (split-by lst n)
   (if (not (empty? lst))
       (cons (take lst n) (split-by (drop lst n) n))
       '() ))

for example

(split-by '(3 2 1 43 54 25 100 -14 -42) 3)

yields

'((3 2 1) (43 54 25) (100 -14 -42))


回答3:

You ask a nice general-purpose question, but I think that what you want here is something that uses byte-strings, rather than lists. Here's some code (including error checking), along with a test case:

#lang racket

(require rackunit)

;; given a byte string, split it into a vector of byte-strings
;; of length 4
(define (split-bytes bytes)
  (define len (bytes-length bytes))
  (unless (= 0 (modulo len 4))
    (raise-type-error 'split-bytes
                      "byte string of length divisible by 4"
                      0 bytes))
  (for/vector ([i (in-range (/ len 4))])
     (subbytes bytes (* 4 i) (* 4 (add1 i)))))

(check-equal?
 (split-bytes
  #"hhoh.h9ew,n49othsv97")
 (vector #"hhoh"
         #".h9e"
         #"w,n4"
         #"9oth"
         #"sv97"))


回答4:

If you'd like to get better at solving problems like this, I'd highly recommend The Little Schemer. It will train you to think in terms that will make these problems tractable, and it only takes a few hours to run through, cover-to-cover.



回答5:

Another way of doing that is to provide a higher-order function map-n, which takes n values from a list and applies a function to them:

(define (map-n n fn l . lists)
  (if (any (lambda(l)(< (length l) n)) (cons l lists))
      '()
      (cons (apply fn (append-map (lambda(l)(take l n)) (cons l lists)))
            (apply map-n n fn (map (lambda(l)(drop l n)) (cons l lists))))))

(e.g. (map-n 4 + '(1 2 3 4 5 6 7 8)) ===> (10 26))
(e.g. (map-n 3 (lambda (a b c) a) '(1 2 3 4 5 6)) ===> (1 4))

Having this function, one can simply

(define (split-by n l)
  (map-n n list l))

The disadvantage might be that if the length of the list isn't divisible by n, the remainder will be rejected from the result.

Another funky way is to create a function that splits a list into chunks of arbitrary size:

(define (chunk-list l . chunk-sizes)
  (assert (and (list? l)
               (every natual? chunk-sizes)
               (<= (fold + 0 chunk-sizes) (length l))))
  (let loop ((result '())
             (sizes chunk-sizes)
             (rest l))
    (match sizes
      (()
       (reverse result))
      ((size . sizes)
       (let-values (((this rest) (split-at rest size)))
         (loop `(,this ,@result) sizes rest))))))

(e.g. (chunk-list '(1 2 3 4 5 6 7 8 9 10) 0 1 2 3 4)
      ===> (() (1) (2 3) (4 5 6) (7 8 9 10))

and then define split-by using make-list:

(define (split-by n l)
  (let ((size (quotient (length l) n)))
    (apply chunk-list l (make-list size n))))

Note that the definition of map-n uses the any function from srfi-1, and chunk-list uses Alex Shinn's pattern matcher (although it could be easily rewritten using plain if, eq?, car and cdr)



回答6:

If you're looking for a tail-recursive solution, one approach is to use the named let:

(define (group n xs)
  (let loop ([grouped '()] [xs xs])
    (if (empty? xs)
        (reverse grouped)
        (loop (cons (take xs n) grouped)
              (drop xs n)))))

However, this fails if xs would have leftover elements, so we'll need to add a case that checks for this:

(define (group n xs)
  (let loop ([grouped '()] [xs xs])
    (cond
      [(empty? xs)
       (reverse grouped)]
      [(<= (length xs) n)
       (loop (cons xs grouped) '())]
      [else (loop (cons (take xs n) grouped)
                  (drop xs n))])))

This works, but we can do better. The problem here is that calculating (length xs) runs in linear time, because the only way to find the length of a simple list is to walk the entire list. Since this is inside a loop that is run a number of times proportional to the size of xs, this code runs in quadratic time when it should be simple to accomplish it in linear time. We can circumvent this problem by calculating the length of xs once and then subtracting n from it each iteration:

(define (group n xs)
  (let loop ([grouped '()] [xs xs] [l (length xs)])
    (cond
      [(empty? xs)
       (reverse grouped)]
      [(<= l n)
       (loop (cons xs grouped) '() 0)]
      [else (loop (cons (take xs n) grouped)
                  (drop xs n)
                  (- l n))])))

And we're back to linear time while still preserving tail recursion, and therefore constant space. There is another improvement we can make however. The Racket function split-at combines the functionality of take and drop and returns both lists as two values. To use multiple-value-returning functions, we can use let-values:

(define (group n xs)
  (let loop ([grouped '()] [xs xs] [l (length xs])
    (cond
      [(empty? xs)
       (reverse grouped)]
      [(<= l n)
       (loop (cons xs grouped) '() 0)]
      [else (let-values ([(taken dropped) (split-at xs n)])
              (loop (cons taken grouped)
                    dropped
                    (- l n)))])))

This is slightly faster because split-at can avoid the repeat work of ignoring the first n elements in the drop portion of it's functionality because those elements will already be consumed by take. This code does however not account for bad user input. If a user supplies an n larger than the length of xs, it will throw an error when it should return (list xs). This is simple enough to check for, but our code is getting awfully extended towards the right with all this nesting. In addition to checking for this, we can split the loop off into an internally defined function:

(define (group n xs)
  (define (loop grouped xs l)
    (cond
      [(empty? xs)
       (reverse grouped)]
      [(<= l n)
       (loop (cons xs grouped) '() 0)]
      [else (let-values ([(taken dropped) (split-at xs n)])
              (loop (cons taken grouped)
                    dropped
                    (- l n)))]))
  (let ([l (length xs)])
    (if (>= n l)
        (list xs)
        (loop '() xs l))))

This function is tail recursive, does not calculate (length xs) more than once, ensures that (group 4 '(1 2 3)) evaluates to '((1 2 3)), and uneven grouping such that (group 2 '(1 2 3) evaluates to '((1 2) (3)), running in linear time and constant space.