Idiomatic clojure for progress reporting?

2019-01-21 09:40发布

问题:

How should I monitor the progress of a mapped function in clojure?

When processing records in an imperative language I often print a message every so often to indicate how far things have gone, e.g. reporting every 1000 records. Essentially this is counting loop repetitions.

I was wondering what approaches I could take to this in clojure where I am mapping a function over my sequence of records. In this case printing the message (and even keeping count of the progress) seem to be essentially side-effects.

What I have come up with so far looks like:

(defn report
  [report-every val cnt]
  (if (= 0 (mod cnt report-every))
    (println "Done" cnt))
    val)

(defn report-progress
  [report-every aseq]
  (map (fn [val cnt] 
          (report report-every val cnt)) 
       aseq 
       (iterate inc 1)))

For example:

user> (doall (report-progress 2 (range 10)))
Done 2
Done 4
Done 6
Done 8
Done 10
(0 1 2 3 4 5 6 7 8 9)

Are there other (better) ways of achieving this effect?

Are there any pitfalls in what I am doing? (I think I am preserving laziness and not holding the head for example.)

回答1:

The great thing about clojure is you can attach the reporting to the data itself instead of the code that does the computing. This allows you to separate these logically distinct parts. Here is a chunk from my misc.clj that I find I use in just about every project:

(defn seq-counter 
  "calls callback after every n'th entry in sequence is evaluated. 
  Optionally takes another callback to call once the seq is fully evaluated."
  ([sequence n callback]
     (map #(do (if (= (rem %1 n) 0) (callback)) %2) (iterate inc 1) sequence))
  ([sequence n callback finished-callback]
     (drop-last (lazy-cat (seq-counter sequence n callback) 
                  (lazy-seq (cons (finished-callback) ())))))) 

then wrap the reporter around your data and then pass the result to the processing function.

(map process-data (seq-counter inc-progress input))


回答2:

I would probably perform the reporting in an agent. Something like this:

(defn report [a]
  (println "Done " s)
  (+ 1 s))

(let [reports (agent 0)]
  (map #(do (send reports report)
            (process-data %))
       data-to-process)


回答3:

I don't know of any existing ways of doing that, maybe it would be a good idea to browse clojure.contrib documentation to look if there's already something. In the meantime, I've looked at your example and cleared it up a little bit.

(defn report [cnt]
  (when (even? cnt)
    (println "Done" cnt)))

(defn report-progress []
  (let [aseq (range 10)]
    (doall (map report (take (count aseq) (iterate inc 1))))
    aseq))

You're heading in the right direction, even though this example is too simple. This gave me an idea about a more generalized version of your report-progress function. This function would take a map-like function, the function to be mapped, a report function and a set of collections (or a seed value and a collection for testing reduce).

(defn report-progress [m f r & colls]
  (let [result (apply m
                 (fn [& args]
                   (let [v (apply f args)]
                     (apply r v args) v))
                 colls)]
    (if (seq? result)
      (doall result)
      result)))

The seq? part is there only for use with reduce which doesn't necessarily returns a sequence. With this function, we can rewrite your example like this:

user> 
(report-progress
  map
  (fn [_ v] v)
  (fn [result cnt _]
    (when (even? cnt)
      (println "Done" cnt)))
  (iterate inc 1)
  (range 10))

Done 2
Done 4
Done 6
Done 8
Done 10
(0 1 2 3 4 5 6 7 8 9)

Test the filter function:

user> 
(report-progress
  filter
  odd?
  (fn [result cnt]
    (when (even? cnt)
      (println "Done" cnt)))
  (range 10))

Done 0
Done 2
Done 4
Done 6
Done 8
(1 3 5 7 9)

And even the reduce function:

user> 
(report-progress
  reduce
  +
  (fn [result s v]
    (when (even? s)
      (println "Done" s)))
  2
  (repeat 10 1))

Done 2
Done 4
Done 6
Done 8
Done 10
12


回答4:

I have had this problem with some slow-running apps (e.g. database ETL, etc). I solved it by adding the function (tupelo.misc/dot ...) to the tupelo library. Sample:

(ns xxx.core 
  (:require [tupelo.misc :as tm]))

(tm/dots-config! {:decimation 10} )
(tm/with-dots
  (doseq [ii (range 2345)]
    (tm/dot)
    (Thread/sleep 5)))

Output:

     0 ....................................................................................................
  1000 ....................................................................................................
  2000 ...................................
  2345 total

API docs for the tupelo.misc namespace can be found here.