I read the very interesting article on the architecture of the Scala 2.8 collections and I've been experimenting with it a little bit. For a start, I simply copied the final code for the nice RNA
example. Here it is for reference:
abstract class Base
case object A extends Base
case object T extends Base
case object G extends Base
case object U extends Base
object Base {
val fromInt: Int => Base = Array(A, T, G, U)
val toInt: Base => Int = Map(A -> 0, T -> 1, G -> 2, U -> 3)
}
final class RNA private (val groups: Array[Int], val length: Int)
extends IndexedSeq[Base] with IndexedSeqLike[Base, RNA] {
import RNA._
// Mandatory re-implementation of `newBuilder` in `IndexedSeq`
override protected[this] def newBuilder: Builder[Base, RNA] =
RNA.newBuilder
// Mandatory implementation of `apply` in `IndexedSeq`
def apply(idx: Int): Base = {
if (idx < 0 || length <= idx)
throw new IndexOutOfBoundsException
Base.fromInt(groups(idx / N) >> (idx % N * S) & M)
}
// Optional re-implementation of foreach,
// to make it more efficient.
override def foreach[U](f: Base => U): Unit = {
var i = 0
var b = 0
while (i < length) {
b = if (i % N == 0) groups(i / N) else b >>> S
f(Base.fromInt(b & M))
i += 1
}
}
}
object RNA {
private val S = 2 // number of bits in group
private val M = (1 << S) - 1 // bitmask to isolate a group
private val N = 32 / S // number of groups in an Int
def fromSeq(buf: Seq[Base]): RNA = {
val groups = new Array[Int]((buf.length + N - 1) / N)
for (i <- 0 until buf.length)
groups(i / N) |= Base.toInt(buf(i)) << (i % N * S)
new RNA(groups, buf.length)
}
def apply(bases: Base*) = fromSeq(bases)
def newBuilder: Builder[Base, RNA] =
new ArrayBuffer mapResult fromSeq
implicit def canBuildFrom: CanBuildFrom[RNA, Base, RNA] =
new CanBuildFrom[RNA, Base, RNA] {
def apply(): Builder[Base, RNA] = newBuilder
def apply(from: RNA): Builder[Base, RNA] = newBuilder
}
}
Now, here's my problem. If I run this, everything's fine:
val rna = RNA(A, G, T, U)
println(rna.map(e => e)) // prints RNA(A, G, T, U)
but this code transforms the RNA to a Vector!
val rna: IndexedSeq[Base] = RNA(A, G, T, U)
println(rna.map(e => e)) // prints Vector(A, G, T, U)
This is a problem, as client code unaware of the RNA
class may transform it back to a Vector
instead when only mapping from Base
to Base
. Why is that so, and what are the ways to fix it?
P.-S.: I've found a tentative answer (see below), please correct me if I'm wrong.
If the static type of the
rna
variable isIndexedSeq[Base]
, the automatically insertedCanBuildFrom
cannot be the one defined in theRNA
companion object, as the compiler is not supposed to know thatrna
is an instance ofRNA
.So where does it come from? The compiler falls back on an instance of
GenericCanBuildFrom
, the one defined in theIndexedSeq
object.GenericCanBuildFrom
s produce their builders by callinggenericBuilder[B]
on the originating collection, and a requirement for that generic builder is that it can produce generic collections that can hold any typeB
— as of course, the return type of the function passed to amap()
is not constrained.In this case,
RNA
is only anIndexedSeq[Base]
and not a genericIndexedSeq
, so it's not possible to overridegenericBuilder[B]
inRNA
to return aRNA
-specific builder — we would have to check at runtime whetherB
isBase
or something else, but we cannot do that.I think this explains why, in the question, we get a
Vector
back. As to how we can fix it, it's an open question…Edit: Fixing this requires
map()
to know whether it's mapping to a subtype ofA
or not. A significant change in the collections library would be needed for this to happen. See the related question Should Scala's map() behave differently when mapping to the same type?.On why I think it's not a good idea to statically type to a weaker type than RNA. It should really be a comment (cause it's more an opinion but that would be harder to read). From your comment to my comment:
filter
does it because the compiler can statically guarantee it. If you keep elements from a particular collection, you end up with a collection from the same type.map
cannot guarantee that, it depends on the function that is passed.My point is more on the act of specifying explicitly a type and expecting more than what it can deliver. As a user of the RNA collection, I may write code that depends on certain properties of this collection such as efficient memory representation.
So let's assume I state in
val rna: IndexedSeq[Base]
thatrna
is just anIndexedSeq
. A few lines later I call a methoddoSomething(rna)
where I expect the efficient memory representation, what would be the best signature for that?def doSomething[T](rna: IndexedSeq[Base]): T
ordef doSomething[T](rna: RNA): T
?I think it should be the latter. But if that's the case, then the code won't compile because
rna
is not statically anRNA
object. If the method signature should be the former, then in essence I'm saying that I don't care about the memory representation efficiency. So I think the act of specifying a weaker type explicitly but expecting a stronger behavior is a contradiction. Which is what you do in your example.Now I do see that even if I did:
where somebody else wrote:
I would like to have
rna2
be aRNA
object but that won't happen... It means that this somebody else should write a method that takes aCanBuildFrom
if they want to have callers get more specific types:Then I could call:
val rna2: RNA = doSomething(rna)(collection.breakOut)