I have a large array and need compute result based on each item of this array. My PC's processor has 2 cores. I have compared different ways to achieve parallel execution in Kotlin.
I wrote simple example to illustrate this. First way is Java parallel stream, second is plain Kotlin map, third is coroutine version of map.
fun p() = runBlocking {
val num = (0 until 1_000_000).toList()
println(measureTimeMillis {
num.stream().parallel().map { it * 2 }.collect(Collectors.toList())
})
println(measureTimeMillis {
num.map { it * 2 }
})
println(measureTimeMillis {
num.pmap { it * 2 }
})
}
suspend fun <A, B> Iterable<A>.pmap(f: suspend (A) -> B): List<B> = coroutineScope {
map { async { f(it) } }.map { it.await() }
}
The output (in ms.):
152
64
1620
Why pmap version is so slow? How to improve the code?