I am working on a multithreaded number crunching app, let's call it myprogram
. I plan to run myprogram
on IBM's LSF grid. LSF allows a job to scheduled on CPUs from different machines. For example, bsub -n 3 ... myprogram ...
can allocate two CPUs from node1 and one CPU from node2.
I know that I can ask LSF to allocate all 3 cores in the same node, but I am interested in the case where my job is scheduled onto different nodes.
How does LSF manage this? Will myprogram
be run in two different processes in node1 and node2?
Does LSF automatically manage data transfer between node1 and node2?
Anything I can do in myprogram
to make this easy for LSF to manage? Should I be making use of any LSF libraries?
Answer to Q1
When you submit a job like bsub -n 3 myprogram
, all LSF does is allocate 3 slots across 1-3 hosts. One of these hosts will be designated as the "first execution host", and LSF will dispatch and run a single instance of myprogram
on that host.
If you want to run myprogram
in parallel, LSF has a command called blaunch
which will essentially launch one instance of a program per allocated core. For example, submit your job like bsub -n 3 blaunch myprogram
will run 3 instances of myprogram
.
Answer to Q2
By "manage data transfer" I assume you mean communication between the instances of myprogram
. The answer is no, LSF is a scheduling and dispatching tool. All it does is allocation and dispatch, but it has no knowledge of what the dispatched program is doing. blaunch
in turn is simply a task launcher, it just launches multiple instances of a task.
What you're after here is some kind of parallel programming framework like MPI (see for example www.openmpi.org). This provides a set of APIs and commands that allow you to write myprogram
in a parallel fashion.
Once you've done that and turned your program in to mympiprogram
, you can submit it to LSF like bsub -n 3 mpirun mympiprogram
. The mpirun
tool - at least in the case of OpenMPI (and some others) - integrates with LSF, and uses the blaunch
interface under the hood to launch your tasks for you.
Answer to Q3
You don't need to use LSF libraries in your program to make anything easier for LSF, like I said what's going on inside the program is transparent to the system. LSF libraries just enable your program to become a client of the LSF system (submit jobs, query, etc...)