What algorithm is used by the Scala library method

2019-04-24 11:09发布

问题:

I have been looking at the Scala documentation but so far I have found no answer to my question, namely what sorting algorithm is used by the method

scala.collection.immutable.Vector.sorted

The documenation says it is a stable sort, but not the actual algorithm used. Is it a merge sort?

回答1:

The sorted method is implemented in SeqLike, and seems to use java.util.Arrays.sort for its sorting. It builds an array from the vector, then invokes Arrays.sort and then converts it back, it seems. According to the Java 6 documentation, it therefore uses quicksort:

The sorting algorithm is a tuned quicksort, adapted from Jon L. Bentley and M. Douglas McIlroy's "Engineering a Sort Function", Software-Practice and Experience, Vol. 23(11) P. 1249-1265 (November 1993). This algorithm offers n*log(n) performance on many data sets that cause other quicksorts to degrade to quadratic performance.

For Java 7, the algorithm seems to have change (again, citing the docs):

The sorting algorithm is a Dual-Pivot Quicksort by Vladimir Yaroslavskiy, Jon Bentley, and Joshua Bloch. This algorithm offers O(n log(n)) performance on many data sets that cause other quicksorts to degrade to quadratic performance, and is typically faster than traditional (one-pivot) Quicksort implementations.

Scala's SeqLike#sorted source (taken from GitHub):

/** Sorts this $coll according to an Ordering.
   *
   *  The sort is stable. That is, elements that are equal (as determined by
   *  `lt`) appear in the same order in the sorted sequence as in the original.
   *
   *  @see [[scala.math.Ordering]]
   *
   *  @param  ord the ordering to be used to compare elements.
   *  @return     a $coll consisting of the elements of this $coll
   *              sorted according to the ordering `ord`.
   */
  def sorted[B >: A](implicit ord: Ordering[B]): Repr = {
    val len = this.length
    val arr = new ArraySeq[A](len)
    var i = 0
    for (x <- this.seq) {
      arr(i) = x
      i += 1
    }
    java.util.Arrays.sort(arr.array, ord.asInstanceOf[Ordering[Object]])
    val b = newBuilder
    b.sizeHint(len)
    for (x <- arr) b += x
    b.result
  }


回答2:

I agree broadly with accepted answer. However i would like to add some points w.r.t Java 7 Arrays

The answer would slightly change w.r.t to changes brought on since JAVA 7 which uses a variation of QuickSort called DualPivotQuickSort to sort primitive values in java.util.Arrays

JAVA 7 Array Sorting is different for Primitives and Objects.

Note from Docs -

For Primitive value Arrays :

The sorting algorithm is a Dual-Pivot Quicksort by Vladimir Yaroslavskiy, Jon Bentley, and Joshua Bloch. This algorithm offers O(n log(n)) performance on many data sets that cause other quicksorts to degrade to quadratic performance, and is typically faster than traditional (one-pivot) Quicksort implementations.

For Objects Arrays :

The implementation was adapted from Tim Peters's list sort for Python ( TimSort). It uses techiques from Peter McIlroy's "Optimistic Sorting and Information Theoretic Complexity", in Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pp 467-474, January 1993.

SeqLike.sorted uses 
java.util.Arrays.sort(arr.array, ord.asInstanceOf[Ordering[Object]]) 
where arr.array is instead of ArraySeq.

ArraySeq in Scala stores its data in a plain old Java array, 
but it does not store arrays of primitives; everything is an array of objects. 
This means that primitives get boxed on the way in. 

So Java Arrays.sort should be using the TimSort !!!!!!

Additional Details -

You can find the JAVA7 implementation here - http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/java/util/Arrays.java

Further Awesome Reading on JAVA 7 - http://permalink.gmane.org/gmane.comp.java.openjdk.core-libs.devel/2628

The changes have impact on scala sorting as being discussed here - http://grokbase.com/t/gg/scala-internals/12ab76zqnk/specializing-ordering-faster-sort