Should Scala's map() behave differently when m

2019-02-17 12:08发布

In the Scala Collections framework, I think there are some behaviors that are counterintuitive when using map().

We can distinguish two kinds of transformations on (immutable) collections. Those whose implementation calls newBuilder to recreate the resulting collection, and those who go though an implicit CanBuildFrom to obtain the builder.

The first category contains all transformations where the type of the contained elements does not change. They are, for example, filter, partition, drop, take, span, etc. These transformations are free to call newBuilder and to recreate the same collection type as the one they are called on, no matter how specific: filtering a List[Int] can always return a List[Int]; filtering a BitSet (or the RNA example structure described in this article on the architecture of the collections framework) can always return another BitSet (or RNA). Let's call them the filtering transformations.

The second category of transformations need CanBuildFroms to be more flexible, as the type of the contained elements may change, and as a result of this, the type of the collection itself maybe cannot be reused: a BitSet cannot contain Strings; an RNA contains only Bases. Examples of such transformations are map, flatMap, collect, scanLeft, ++, etc. Let's call them the mapping transformations.

Now here's the main issue to discuss. No matter what the static type of the collection is, all filtering transformations will return the same collection type, while the collection type returned by a mapping operation can vary depending on the static type.

scala> import collection.immutable.TreeSet
import collection.immutable.TreeSet

scala> val treeset = TreeSet(1,2,3,4,5) // static type == dynamic type
treeset: scala.collection.immutable.TreeSet[Int] = TreeSet(1, 2, 3, 4, 5)

scala> val set: Set[Int] = TreeSet(1,2,3,4,5) // static type != dynamic type
set: Set[Int] = TreeSet(1, 2, 3, 4, 5)

scala> treeset.filter(_ % 2 == 0)
res0: scala.collection.immutable.TreeSet[Int] = TreeSet(2, 4) // fine, a TreeSet again

scala> set.filter(_ % 2 == 0)    
res1: scala.collection.immutable.Set[Int] = TreeSet(2, 4) // fine

scala> treeset.map(_ + 1)        
res2: scala.collection.immutable.SortedSet[Int] = TreeSet(2, 3, 4, 5, 6) // still fine

scala> set.map(_ + 1)    
res3: scala.collection.immutable.Set[Int] = Set(4, 5, 6, 2, 3) // uh?!

Now, I understand why this works like this. It is explained there and there. In short: the implicit CanBuildFrom is inserted based on the static type, and, depending on the implementation of its def apply(from: Coll) method, may or may not be able to recreate the same collection type.

Now my only point is, when we know that we are using a mapping operation yielding a collection with the same element type (which the compiler can statically determine), we could mimic the way the filtering transformations work and use the collection's native builder. We can reuse BitSet when mapping to Ints, create a new TreeSet with the same ordering, etc.

Then we would avoid cases where

for (i <- set) {
  val x = i + 1
  println(x)
}

does not print the incremented elements of the TreeSet in the same order as

for (i <- set; x = i + 1)
  println(x)

So:

  • Do you think this would be a good idea to change the behavior of the mapping transformations as described?
  • What are the inevitable caveats I have grossly overlooked?
  • How could it be implemented?

I was thinking about something like an implicit sameTypeEvidence: A =:= B parameter, maybe with a default value of null (or rather an implicit canReuseCalleeBuilderEvidence: B <:< A = null), which could be used at runtime to give more information to the CanBuildFrom, which in turn could be used to determine the type of builder to return.

1条回答
Explosion°爆炸
2楼-- · 2019-02-17 12:32

I looked again at it, and I think your problem doesn't arise from a particular deficiency of Scala collections, but rather a missing builder for TreeSet. Because the following does work as intended:

val list = List(1,2,3,4,5)
val seq1: Seq[Int] = list
seq1.map( _ + 1 ) // yields List

val vector = Vector(1,2,3,4,5)
val seq2: Seq[Int] = vector
seq2.map( _ + 1 ) // yields Vector

So the reason is that TreeSet is missing a specialised companion object/builder:

seq1.companion.newBuilder[Int]    // ListBuffer
seq2.companion.newBuilder[Int]    // VectorBuilder
treeset.companion.newBuilder[Int] // Set (oops!)

So my guess is, if you take proper provision for such a companion for your RNA class, you may find that both map and filter work as you wish...?

查看更多
登录 后发表回答