斯卡拉并行归并 - 内存不足(Scala Parallel Mergesort - Out of M

2019-11-01 20:21发布

I've tried to write a parallel Mergesort using Scala Futures. However, when I run my algorithm on a list of size 100 000 inside Eclipse's interpreter everything gets very sluggish and eventually I get an error message telling me I'm out of memory. When I run it in the interpreter from the command line it hangs already at lists of size 10 000 (but now I get no error messages).

Why does this happen and is there a fix?

import scala.actors.Future
import scala.actors.Futures._

object MergeSort{
    def sort[T <% Ordered[T]](toBeSorted :List[T]) :List[T] = toBeSorted match{
      case Nil => Nil
      case List(x) => List(x)
      case someList =>
        val (left, right) = someList splitAt someList.length/2
        val sortedLeft = future { sort(left) }
        val sortedRight = sort(right)
        merge(sortedLeft(), sortedRight, Nil)
    }

    def merge[T <% Ordered[T]](a :List[T], b :List[T], Ack: List[T]) :List[T] = (a, b) match {
      case (Nil, ys) => Ack.reverse ++ ys
      case (xs, Nil) => Ack.reverse ++ xs
      case (x::xs, y::ys) if x < y => merge(xs, y::ys, x::Ack)
      case (x::xs, y::ys) => merge(x::xs, ys, y::Ack)
    }
}

Answer 1:

你应该尝试使用阿卡未来和调整,根据您的需要执行上下文:

  • http://doc.akka.io/docs/akka/2.0.1/scala/futures.html

它看起来像STD-lib中不会给你的使用情况一样,良好的默认值。



Answer 2:

正如雷克斯指出,(任何)未来API的开销是相当大的,不应被忽略。

不要浪费在上下文切换开销珍贵的CPU和内存。 您应该将清单分成合理大小的块,并执行在同一个线程进行排序。

举例来说,如果你有你的机器和4GB内存的4个核。 你可以把它分成500MB块,最多4个归并排序同时运行。 这将您的吞吐量和并行性最大。

你可以使用SIP-14的ExecutionContext中限制使用的线程数。

private val GLOBAL_THREAD_LIMIT = Runtime.getRuntime.availableProcessors()
private lazy implicit val executionContext =
   ExecutionContext.fromExecutorService(
       Executors.newFixedThreadPool(GLOBAL_THREAD_LIMIT)
)

顺便说一句,我已经实现了在SIP-14并行外部合并排序。 我已经解释在我的博客的实施细则: http://blog.yunglinho.com/blog/2013/03/19/parallel-external-merge-sort/



文章来源: Scala Parallel Mergesort - Out of Memory