How to implement Future as Applicative in Scala?

2019-02-17 09:14发布

Suppose I need to run two concurrent computations, wait for both of them, and then combine their results. More specifically, I need to run f1: X1 => Y1 and f2: X2 => Y2 concurrently and then call f: (Y1, Y2) => Y to finally get a value of Y.

I can create future computations fut1: X1 => Future[Y1] and fut2: X2 => Future[Y2] and then compose them to get fut: (X1, X2) => Future[Y] using monadic composition.

The problem is that monadic composition implies sequential wait. In our case it implies that we wait for one future first and then we will wait for another. For instance. if it takes 2 sec. to the first future to complete and just 1 sec. to the 2nd future to fail we waste 1 sec.

Thus it looks like we need an applicative composition of the futures to wait till either both complete or at least one future fails. Does it make sense ? How would you implement <*> for futures ?

4条回答
做个烂人
2楼-- · 2019-02-17 09:32

Your post seems to contain two more or less independent questions. I will address the concrete practical problem of running two concurrent computations first. The question about Applicative is answered in the very end.

Suppose you have two asynchronous functions:

val f1: X1 => Future[Y1]
val f2: X2 => Future[Y2]

And two values:

val x1: X1
val x2: X2  

Now you can start the computations in multiple different ways. Let's take a look at some of them.

Starting computations outside of for (parallel)

Suppose you do this:

val y1: Future[Y1] = f1(x1)
val y2: Future[Y2] = f2(x2)

Now, the computations f1 and f2 are already running. It does not matter in which order you collect the results. You could do it with a for-comprehension:

val y: Future[(Y1,Y2)] = for(res1 <- y1; res2 <- y2) yield (res1,res2)

Using the expressions y1 and y2 in the for-comprehension does not interfere with the order of computation of y1 and y2, they are still being computed in parallel.

Starting computations inside of for (sequential)

If we simply take the definitions of y1 and y2, and plug them into the for comprehension directly, we will still get the same result, but the order of execution will be different:

val y = for (res1 <- f1(x1); res2 <- f2(x2)) yield (res1, res2)

translates into

val y = f1(x1).flatMap{ res1 => f2(x2).map{ res2 => (res1, res2) } }

in particular, the second computation starts after the first one has terminated. This is usually not what one wants to have.

Here, a basic substitution principle is violated. If there were no side-effects, one probably could transform this version into the previous one, but in Scala, one has to take care of the order of execution explicitly.

Zipping futures (parallel)

Futures respect products. There is a method Future.zip, which allows you to do this:

val y = f1(x1) zip f2(x2)

This would run both computations in parallel until both are done, or until one of them fails.

Demo

Here is a little script that demonstrates this behaviour (inspired by muhuk's post):

import scala.concurrent._
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
import java.lang.Thread.sleep
import java.lang.System.{currentTimeMillis => millis}

var time: Long = 0

val x1 = 1
val x2 = 2

// this function just waits
val f1: Int => Future[Unit] = { 
  x => Future { sleep(x * 1000) }
}

// this function waits and then prints
// elapsed time
val f2: Int => Future[Unit] = {
  x => Future { 
    sleep(x * 1000)
    val elapsed = millis() - time
    printf("Time: %1.3f seconds\n", elapsed / 1000.0)
  }
}

/* Outside `for` */ {
  time = millis()
  val y1 = f1(x1)
  val y2 = f2(x2)
  val y = for(res1 <- y1; res2 <- y2) yield (res1,res2)
  Await.result(y, Duration.Inf)
}

/* Inside `for` */ {
  time = millis()
  val y = for(res1 <- f1(x1); res2 <- f2(x2)) yield (res1, res2)
  Await.result(y, Duration.Inf)
}

/* Zip */ {
  time = millis()
  val y = f1(x1) zip f2(x2)
  Await.result(y, Duration.Inf)
}

Output:

Time: 2.028 seconds
Time: 3.001 seconds
Time: 2.001 seconds

Applicative

Using this definition from your other post:

trait Applicative[F[_]] {
  def apply[A, B](f: F[A => B]): F[A] => F[B]
}

one could do something like this:

object FutureApplicative extends Applicative[Future] {
  def apply[A, B](ff: Future[A => B]): Future[A] => Future[B] = {
    fa => for ((f,a) <- ff zip fa) yield f(a)
  }
}

However, I'm not sure what this has to do with your concrete problem, or with understandable and readable code. A Future already is a monad (this is stronger than Applicative), and there is even built-in syntax for it, so I don't see any advantages in adding some Applicatives here.

查看更多
Evening l夕情丶
3楼-- · 2019-02-17 09:33

None of the methods in other answers does the right thing in case of a future that fails quickly plus a future that succeeds after a long time.

But such a method can be implemented manually:

def smartSequence[A](futures: Seq[Future[A]]): Future[Seq[A]] = {
  val counter = new AtomicInteger(futures.size)
  val result = Promise[Seq[A]]()

  def attemptComplete(t: Try[A]): Unit = {
    val remaining = counter.decrementAndGet
    t match {
      // If one future fails, fail the result immediately
      case Failure(cause) => result tryFailure cause
      // If all futures have succeeded, complete successful result
      case Success(_) if remaining == 0 => 
        result tryCompleteWith Future.sequence(futures)
      case _ =>
    }
  }

  futures.foreach(_ onComplete attemptComplete)
  result.future
}

ScalaZ does a similar thing internally, so both f1 |@| f2 and List(f1, f2).sequence fail immediately after any of the futures fails.

Here is a quick test of the failing time for those methods:

import java.util.Date
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
import scalaz._, Scalaz._

object ReflectionTest extends App {
  def f1: Future[Unit] = Future {
    Thread.sleep(2000)
  }

  def f2: Future[Unit] = Future {
    Thread.sleep(1000)
    throw new RuntimeException("Failure")
  }

  def test(name: String)(
    f: (Future[Unit], Future[Unit]) => Future[Unit]
  ): Unit = {
    val start = new Date().getTime
    f(f1, f2).andThen {
      case _ => 
        println(s"Test $name completed in ${new Date().getTime - start}")
    }
    Thread.sleep(2200)
  }

  test("monadic") { (f1, f2) => for (v1 <- f1; v2 <- f2) yield () }

  test("zip") { (f1, f2) => (f1 zip f2).map(_ => ()) }

  test("Future.sequence") { 
    (f1, f2) => Future.sequence(Seq(f1, f2)).map(_ => ()) 
  }

  test("smartSequence") { (f1, f2) => smartSequence(Seq(f1, f2)).map(_ => ())}

  test("scalaz |@|") { (f1, f2) => (f1 |@| f2) { case _ => ()}}

  test("scalaz sequence") { (f1, f2) => List(f1, f2).sequence.map(_ => ())}

  Thread.sleep(30000)
}

And the result on my machine is:

Test monadic completed in 2281
Test zip completed in 2008
Test Future.sequence completed in 2007
Test smartSequence completed in 1005
Test scalaz |@| completed in 1003
Test scalaz sequence completed in 1005
查看更多
smile是对你的礼貌
4楼-- · 2019-02-17 09:38

The problem is that monadic composition implies sequential wait. In our case it implies that we wait for one future first and then we will wait for another.

This is unfortunately true.

import java.util.Date
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global

object Test extends App {
        def timestamp(label: String): Unit = Console.println(label + ": " + new Date().getTime.toString)

        timestamp("Start")
        for {
                step1 <- Future {
                        Thread.sleep(2000)
                        timestamp("step1")
                }
                step2 <- Future {
                        Thread.sleep(1000)
                        timestamp("step2")
                }
        } yield { timestamp("Done") }

        Thread.sleep(4000)
}

Running this code outputs:

Start: 1430473518753
step1: 1430473520778
step2: 1430473521780
Done: 1430473521781

Thus it looks like we need an applicative composition of the futures to wait till either both complete or at least one future fails.

I am not sure applicative composition has anything to do with the concurrent strategy. Using for comprehensions, you get a result if all futures complete or a failure if any of them fails. So it's semantically the same.

Why Are They Running Sequentially

I think the reason why futures are run sequentially is because step1 is available within step2 (and in the rest of the computation). Essentially we can convert the for block as:

def step1() = Future {
    Thread.sleep(2000)
    timestamp("step1")
}
def step2() = Future {
    Thread.sleep(1000)
    timestamp("step2")
}
def finalStep() = timestamp("Done")
step1().flatMap(step1 => step2()).map(finalStep())

So the result of previous computations are available to the rest of the steps. It differs from <?> & <*> in this respect.

How To Run Futures In Parallel

@andrey-tyukin's code runs futures in parallel:

import java.util.Date
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global

object Test extends App {
        def timestamp(label: String): Unit = Console.println(label + ": " + new Date().getTime.toString)

        timestamp("Start")
        (Future {
                Thread.sleep(2000)
                timestamp("step1")
        } zip Future {
                Thread.sleep(1000)
                timestamp("step2")
        }).map(_ => timestamp("Done"))
        Thread.sleep(4000)
}

Output:

Start: 1430474667418
step2: 1430474668444
step1: 1430474669444
Done: 1430474669446
查看更多
兄弟一词,经得起流年.
5楼-- · 2019-02-17 09:42

It needs not be sequential. The future computation may start the moment the future is created. Of course, if the future is created by the flatMap argument (and it will necessary be so if it needs the result of the first computation), then it will be sequential. But in code such as

val f1 = Future {....}
val f2 = Future {....}
for (a1 <- f1; a2 <- f2) yield f(a1, a2)

you get concurrent execution.

So the implementation of Applicative implied by Monad is ok.

查看更多
登录 后发表回答