Into or vec: converting sequence back to vector in

2020-05-23 03:25发布

问题:

I have the following code which increments the first element of every pair in a vector:

(vec (map (fn [[key value]] [(inc key) value]) [[0 :a] [1 :b]]))

However i fear this code is inelegant, as it first creates a sequence using map and then casts it back to a vector.

Consider this analog:

(into [] (map (fn [[key value]] [(inc key) value]) [[0 :a] [1 :b]]))

On #clojure@irc.freenode.net i was told, that using the code above is bad, because into expands into (reduce conj [] (map-indexed ...)), which produces many intermediate objects in the process. Then i was told that actually into doesn't expand into (reduce conj ...) and uses transients when it can. Also measuring elapsed time showed that into is actually faster than vec.

So my questions are:

  1. What is the proper way to use map over vectors?
  2. What happens underneath, when i use vec and into with vectors?

Related but not duplicate questions:

  • Clojure: sequence back to vector
  • How To Turn a Reduce-Realized Sequence Back Into Lazy Vector Sequence

回答1:

Actually as of Clojure 1.4.0 the preferred way of doing this is to use mapv, which is like map except its return value is a vector. It is by far the most efficient approach, with no unnecessary intermediate allocations at all.

Clojure 1.5.0 will bring a new reducers library which will provide a generic way to map, filter, take, drop etc. while creating vectors, usable with into []. You can play with it in the 1.5.0 alphas and in the recent tagged releases of ClojureScript.

As for (vec some-seq) and (into [] some-seq), the first ultimately delegates to a Java loop which pours some-seq into an empty transient vector, while the second does the same thing in very efficient Clojure code. In both cases there are some initial checks involved to determine which approach to take when constructing the final return value.

vec and into [] are significantly different for Java arrays of small length (up to 32) -- the first will alias the array (use it as the tail of the newly created vector) and demands that the array not be modified subsequently, lest the contents of the vector change (see the docstring); the latter creates a new vector with a new tail and doesn't care about future changes to the array.