As an exercise, I took these Scala and Java examples of Akka to port to Frege. While it works fine, it runs slower(11s) than Scala(540ms) counterpart.
module mmhelloworld.akkatutorialfregecore.Pi where
import mmhelloworld.akkatutorialfregecore.Akka
data PiMessage = Calculate |
Work {start :: Int, nrOfElements :: Int} |
Result {value :: Double} |
PiApproximation {pi :: Double, duration :: Duration}
data Worker = private Worker where
calculatePiFor :: Int -> Int -> Double
calculatePiFor !start !nrOfElements = loop start nrOfElements 0.0 f where
loop !curr !n !acc f = if n == 0 then acc
else loop (curr + 1) (n - 1) (f acc curr) f
f !acc !i = acc + (4.0 * fromInt (1 - (i `mod` 2) * 2) / fromInt (2 * i + 1))
onReceive :: Mutable s UntypedActor -> PiMessage -> ST s ()
onReceive actor Work{start=start, nrOfElements=nrOfElements} = do
sender <- actor.sender
self <- actor.getSelf
sender.tellSender (Result $ calculatePiFor start nrOfElements) self
data Master = private Master {
nrOfWorkers :: Int,
nrOfMessages :: Int,
nrOfElements :: Int,
listener :: MutableIO ActorRef,
pi :: Double,
nrOfResults :: Int,
workerRouter :: MutableIO ActorRef,
start :: Long } where
initMaster :: Int -> Int -> Int -> MutableIO ActorRef -> MutableIO UntypedActor -> IO Master
initMaster nrOfWorkers nrOfMessages nrOfElements listener actor = do
props <- Props.forUntypedActor Worker.onReceive
router <- RoundRobinRouter.new nrOfWorkers
context <- actor.getContext
workerRouter <- props.withRouter router >>= (\p -> context.actorOf p "workerRouter")
now <- currentTimeMillis ()
return $ Master nrOfWorkers nrOfMessages nrOfElements listener 0.0 0 workerRouter now
onReceive :: MutableIO UntypedActor -> Master -> PiMessage -> IO Master
onReceive actor master Calculate = do
self <- actor.getSelf
let tellWorker start = master.workerRouter.tellSender (work start) self
work start = Work (start * master.nrOfElements) master.nrOfElements
forM_ [0 .. master.nrOfMessages - 1] tellWorker
return master
onReceive actor master (Result newPi) = do
let (!newNrOfResults, !pi) = (master.nrOfResults + 1, master.pi + newPi)
when (newNrOfResults == master.nrOfMessages) $ do
self <- actor.getSelf
now <- currentTimeMillis ()
duration <- Duration.create (now - master.start) TimeUnit.milliseconds
master.listener.tellSender (PiApproximation pi duration) self
actor.getContext >>= (\context -> context.stop self)
return master.{pi=pi, nrOfResults=newNrOfResults}
data Listener = private Listener where
onReceive :: MutableIO UntypedActor -> PiMessage -> IO ()
onReceive actor (PiApproximation pi duration) = do
println $ "Pi approximation: " ++ show pi
println $ "Calculation time: " ++ duration.toString
actor.getContext >>= ActorContext.system >>= ActorSystem.shutdown
calculate nrOfWorkers nrOfElements nrOfMessages = do
system <- ActorSystem.create "PiSystem"
listener <- Props.forUntypedActor Listener.onReceive >>= flip system.actorOf "listener"
let constructor = Master.initMaster nrOfWorkers nrOfMessages nrOfElements listener
newMaster = StatefulUntypedActor.new constructor Master.onReceive
factory <- UntypedActorFactory.new newMaster
masterActor <- Props.fromUntypedFactory factory >>= flip system.actorOf "master"
masterActor.tell Calculate
getLine >> return () --Not to exit until done
main _ = calculate 4 10000 10000
Am I doing something wrong with Akka or is it something to do with laziness in Frege for being slow? For example, when I initially had fold
(strict fold) in place of loop
in Worker.calculatePiFor
, it took 27s.
Dependencies:
- Akka native definitions for Frege: Akka.fr
- Java helper to extend Akka classes since we cannot extend a class in Frege: Actors.java
I am not exactly familiar with Actors, but assuming that the tightest loop is indeed
loop
you could avoid passing functionf
as argument.For one, applications of passed functions cannot take advantage of the strictness of the actual passed function. Rather, code generation must assume conservatively that the passed function takes its arguments lazily and returns a lazy result.
Second, in our case you use
f
really just once here, so one can inline it. (This is how it is done in the scala code in the article you linked.)Look at the code generated for the tail recursion in the following sample code that mimics yours:
We get there:
You see here that
f
is called lazily which causes all the argument expressios to also be computed lazily. Note the number of method calls this requires! In your case the code should still be something like:This means, two closures are build with the boxed values acc and curr and then the result is computed, i.e. the function
f
gets called with the unboxed arguments, and the result gets again boxed, just to get unboxed again (forced) for the next loop.Now compare the following, where we just do not pass
f
but call it directly:We get:
Much better! Finally, in the case above we can do without a function call at all:
And this gets:
Please try this out and let us know what happens. The main boost in performance should come from not passing
f
, whereas the inlining will probably be done in the JIT anyway.The additional cost with
fold
is probably because you also had to create some list before applying it.