I asked a related but very general question earlier (see especially this response).
This question is very specific. This is all the code I care about:
result = {}
for line in open('input.txt'):
key, value = parse(line)
result[key] = value
The function parse
is completely self-contained (i.e., doesn't use any shared resources).
I have Intel i7-920 CPU (4 cores, 8 threads; I think the threads are more relevant, but I'm not sure).
What can I do to make my program use all the parallel capabilities of this CPU?
I assume I can open this file for reading in 8 different threads without much performance penalty since disk access time is small relative to the total time.
This can be done using Ray, which is a library for writing parallel and distributed Python.
To run the code below, first create
input.txt
as follows.Then you can process the file in parallel by adding the
@ray.remote
decorator to theparse
function and executing many copies in parallel as followsNote that the optimal way to do this will depend on how long it takes to run the
parse
function. If it takes one second (as above), then parsing one line per Ray task makes sense. If it takes 1 millisecond, then it probably makes sense to parse a bunch of lines (e.g., 100) per Ray task.Your script is simple enough that the multiprocessing module can also be used, however as soon as you want to do anything more complicated or want to leverage multiple machines instead of just one machine, then it will be much easier with Ray.
See the Ray documentation.
As TokenMacGuy said, You can use
multiprocessing
module. If You really need to parse massive amount of data, You should check out the disco project.It really scales up for jobs where Your parse() job is "pure" (i.e., doesn't use any shared resources) and is CPU intensive. I tested a job on a single core and then compared to running it on 3 hosts with 8 cores each. It actually ran 24 times faster when run on the Disco cluster (NOTE: tested for an unreasonably CPU intensive job).
You can use the
multiprocessing
module, but if parse() is quick, you won't get much performance improvement by doing that.or this style
Anyway, you need to realize map/reduce paradigm
cPython does not provide the threading model you are looking for easily. You can get something similar using the
multiprocessing
module and a process poolsuch a solution could look something like this:
Why that's the best way...