Why does Clojure hang after having performed my ca

2020-03-01 06:58发布

I'm experimenting with filtering through elements in parallel. For each element, I need to perform a distance calculation to see if it is close enough to a target point. Never mind that data structures already exist for doing this, I'm just doing initial experiments for now.

Anyway, I wanted to run some very basic experiments where I generate random vectors and filter them. Here's my implementation that does all of this

(defn pfilter [pred coll]
  (map second
    (filter first
      (pmap (fn [item] [(pred item) item]) coll))))

(defn random-n-vector [n]
  (take n (repeatedly rand)))

(defn distance [u v]
  (Math/sqrt (reduce + (map #(Math/pow (- %1 %2) 2) u v))))

(defn -main [& args]
  (let [[n-str vectors-str threshold-str] args
        n (Integer/parseInt n-str)
        vectors (Integer/parseInt vectors-str)
        threshold (Double/parseDouble threshold-str)
        random-vector (partial random-n-vector n)
        u (random-vector)]
    (time (println n vectors 
      (count 
        (pfilter 
          (fn [v] (< (distance u v) threshold))
          (take vectors (repeatedly random-vector))))))))

The code executes and returns what I expect, that is the parameter n (length of vectors), vectors (the number of vectors) and the number of vectors that are closer than a threshold to the target vector. What I don't understand is why the programs hangs for an additional minute before terminating.

Here is the output of a run which demonstrates the error

$ time lein run 10 100000 1.0
     [null] 10 100000 12283
     [null] "Elapsed time: 3300.856 msecs"

real    1m6.336s
user    0m7.204s
sys 0m1.495s

Any comments on how to filter in parallel in general are also more than welcome, as I haven't yet confirmed that pfilter actually works.

1条回答
三岁会撩人
2楼-- · 2020-03-01 07:27

You need to call shutdown-agents to kill the threads backing the threadpool used by pmap.

About pfilter, it should work but run slower than filter, since your predicate is simple. Parallelization isn't free so you have to give each thread moderately intensive tasks to offset the multithreading overhead. Batch your items before filtering them.

查看更多
登录 后发表回答