I am working on a multithreaded number crunching app, let's call it myprogram
. I plan to run myprogram
on IBM's LSF grid. LSF allows a job to scheduled on CPUs from different machines. For example, bsub -n 3 ... myprogram ...
can allocate two CPUs from node1 and one CPU from node2.
I know that I can ask LSF to allocate all 3 cores in the same node, but I am interested in the case where my job is scheduled onto different nodes.
How does LSF manage this? Will
myprogram
be run in two different processes in node1 and node2?Does LSF automatically manage data transfer between node1 and node2?
Anything I can do in
myprogram
to make this easy for LSF to manage? Should I be making use of any LSF libraries?
Answer to Q1
When you submit a job like
bsub -n 3 myprogram
, all LSF does is allocate 3 slots across 1-3 hosts. One of these hosts will be designated as the "first execution host", and LSF will dispatch and run a single instance ofmyprogram
on that host.If you want to run
myprogram
in parallel, LSF has a command calledblaunch
which will essentially launch one instance of a program per allocated core. For example, submit your job likebsub -n 3 blaunch myprogram
will run 3 instances ofmyprogram
.Answer to Q2
By "manage data transfer" I assume you mean communication between the instances of
myprogram
. The answer is no, LSF is a scheduling and dispatching tool. All it does is allocation and dispatch, but it has no knowledge of what the dispatched program is doing.blaunch
in turn is simply a task launcher, it just launches multiple instances of a task.What you're after here is some kind of parallel programming framework like MPI (see for example www.openmpi.org). This provides a set of APIs and commands that allow you to write
myprogram
in a parallel fashion.Once you've done that and turned your program in to
mympiprogram
, you can submit it to LSF likebsub -n 3 mpirun mympiprogram
. Thempirun
tool - at least in the case of OpenMPI (and some others) - integrates with LSF, and uses theblaunch
interface under the hood to launch your tasks for you.Answer to Q3
You don't need to use LSF libraries in your program to make anything easier for LSF, like I said what's going on inside the program is transparent to the system. LSF libraries just enable your program to become a client of the LSF system (submit jobs, query, etc...)