I'm using Spark 1.3.1 and I'm curious why Spark doesn't allow using array keys on map-side combining.
Piece of combineByKey function
:
if (keyClass.isArray) {
if (mapSideCombine) {
throw new SparkException("Cannot use map-side combining with array keys.")
}
}
Basically for the same reason why default partitioner cannot partition array keys.
Scala
Array
is just a wrapper around Java array and itshashCode
doesn't depend on a content:It means that two arrays with exact the same content are not equal
As result
Arrays
cannot be used as a meaningful keys. If you're not convinced just check what happens when you useArray
as key for ScalaMap
:If you want to use a collection as key you should use an immutable data structure like a
Vector
or aList
.See also: