Suppose I need to apply two functions f: String => A
and g: A => B
to each line in a large text file to create eventually a list of B
.
Since the file is large and f
and g
are expensive I would like to make the processing concurrent. I can use "parallel collections" and do something like io.Source.fromFile("data.txt").getLines.toList.par.map(l => g(f(l))
but it does not execute reading the file, f
, and g
concurrently.
What is the best way to implement concurrency in this example?
You can use
map
onFuture
:First, an important note: Don't use
.par
onList
since it requires copying all the data (sinceList
can only be read sequentially). Instead, use something likeVector
, for which the.par
conversion can happen without the copying.It seems like you're thinking of the parallelism the wrong way. Here's what would happen:
If you have a file like this:
And functions
f
andg
:Then you can do:
And get output:
So even though the entire
g(f(l))
operation is happening on the same thread, you can see that each line may be processed in parallel. Thus, manyf
andg
operations can be happening simultaneously on separate threads, but thef
andg
for a particular line will happen in sequentially.This is, after all, the way you should expect since there's actually no way that it could read the line, run
f
, and rung
in parallel. For example, how could it executeg
on the output off
if the line hasn't yet been read?