I have been looking at the Scala documentation but so far I have found no answer to my question, namely what sorting algorithm is used by the method
scala.collection.immutable.Vector.sorted
The documenation says it is a stable sort, but not the actual algorithm used. Is it a merge sort?
The sorted
method is implemented in SeqLike
, and seems to use java.util.Arrays.sort
for its sorting. It builds an array from the vector, then invokes Arrays.sort
and then converts it back, it seems. According to the Java 6 documentation, it therefore uses quicksort:
The sorting algorithm is a tuned quicksort, adapted from Jon L. Bentley and M. Douglas McIlroy's "Engineering a Sort Function", Software-Practice and Experience, Vol. 23(11) P. 1249-1265 (November 1993). This algorithm offers n*log(n) performance on many data sets that cause other quicksorts to degrade to quadratic performance.
For Java 7, the algorithm seems to have change (again, citing the docs):
The sorting algorithm is a Dual-Pivot Quicksort by Vladimir Yaroslavskiy, Jon Bentley, and Joshua Bloch. This algorithm offers O(n log(n)) performance on many data sets that cause other quicksorts to degrade to quadratic performance, and is typically faster than traditional (one-pivot) Quicksort implementations.
Scala's SeqLike#sorted
source (taken from GitHub):
/** Sorts this $coll according to an Ordering.
*
* The sort is stable. That is, elements that are equal (as determined by
* `lt`) appear in the same order in the sorted sequence as in the original.
*
* @see [[scala.math.Ordering]]
*
* @param ord the ordering to be used to compare elements.
* @return a $coll consisting of the elements of this $coll
* sorted according to the ordering `ord`.
*/
def sorted[B >: A](implicit ord: Ordering[B]): Repr = {
val len = this.length
val arr = new ArraySeq[A](len)
var i = 0
for (x <- this.seq) {
arr(i) = x
i += 1
}
java.util.Arrays.sort(arr.array, ord.asInstanceOf[Ordering[Object]])
val b = newBuilder
b.sizeHint(len)
for (x <- arr) b += x
b.result
}
I agree broadly with accepted answer. However i would like to add some points w.r.t Java 7 Arrays
The answer would slightly change w.r.t to changes brought on since JAVA 7 which uses a variation of QuickSort called DualPivotQuickSort
to sort primitive values in java.util.Arrays
JAVA 7 Array Sorting is different for Primitives
and Objects
.
Note from Docs -
For Primitive value Arrays :
The sorting algorithm is a Dual-Pivot Quicksort by Vladimir Yaroslavskiy, Jon Bentley, and Joshua Bloch. This algorithm offers O(n log(n)) performance on many data sets that cause other quicksorts to degrade to quadratic performance, and is typically faster than traditional (one-pivot) Quicksort implementations.
For Objects Arrays :
The implementation was adapted from Tim Peters's list sort for Python ( TimSort). It uses techiques from Peter McIlroy's "Optimistic Sorting and Information Theoretic Complexity", in Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pp 467-474, January 1993.
SeqLike.sorted uses
java.util.Arrays.sort(arr.array, ord.asInstanceOf[Ordering[Object]])
where arr.array is instead of ArraySeq.
ArraySeq in Scala stores its data in a plain old Java array,
but it does not store arrays of primitives; everything is an array of objects.
This means that primitives get boxed on the way in.
So Java Arrays.sort should be using the TimSort !!!!!!
Additional Details -
You can find the JAVA7 implementation here - http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/java/util/Arrays.java
Further Awesome Reading on JAVA 7 - http://permalink.gmane.org/gmane.comp.java.openjdk.core-libs.devel/2628
The changes have impact on scala sorting as being discussed here - http://grokbase.com/t/gg/scala-internals/12ab76zqnk/specializing-ordering-faster-sort