Can any one tell me why I can not successfully test OpenBLAS's dgemm
performance (in GFLOPs) in R via the following way?
- link R with the "reference BLAS"
libblas.so
- compile my C program
mmperf.c
with OpenBLAS librarylibopenblas.so
- load the resulting shared library
mmperf.so
into R, call the R wrapper functionmmperf
and reportdgemm
performance in GFLOPs.
Point 1 looks strange, but I have no choice because I have no root access on machines I want to test, so actual linking to OpenBLAS is impossible. By "not successfully" I mean my program ends up reporting dgemm
performance for reference BLAS instead of OpenBLAS. I hope someone can explain to me:
- why my way does not work;
- is it possible at all to make it work (this is important, because if it is impossible, I must write a C
main
function and do my job in a C program.)
I've investigated into this issue for two days, here I will include various system output to assist you to make a diagnose. To make things reproducible, I will also include the code, makefile as well as shell command.
Part 1: system environment before testing
There are 2 ways to invoke R, either using R
or Rscript
. There are some differences in what is loaded when they are invoked:
~/Desktop/dgemm$ readelf -d $(R RHOME)/bin/exec/R | grep "NEEDED"
0x00000001 (NEEDED) Shared library: [libR.so]
0x00000001 (NEEDED) Shared library: [libpthread.so.0]
0x00000001 (NEEDED) Shared library: [libc.so.6]
~/Desktop/dgemm$ readelf -d $(R RHOME)/bin/Rscript | grep "NEEDED"
0x00000001 (NEEDED) Shared library: [libc.so.6]
Here we need to choose Rscript
, because R
loads libR.so
, which will automatically load the reference BLAS libblas.so.3
:
~/Desktop/dgemm$ readelf -d $(R RHOME)/lib/libR.so | grep blas
0x00000001 (NEEDED) Shared library: [libblas.so.3]
~/Desktop/dgemm$ ls -l /etc/alternatives/libblas.so.3
... 31 May /etc/alternatives/libblas.so.3 -> /usr/lib/libblas/libblas.so.3.0
~/Desktop/dgemm$ readelf -d /usr/lib/libblas/libblas.so.3 | grep SONAME
0x0000000e (SONAME) Library soname: [libblas.so.3]
Comparatively, Rscript
gives a cleaner environment.
Part 2: OpenBLAS
After downloading source file from OpenBLAS and a simple make
command, a shared library of the form libopenblas-<arch>-<release>.so-<version>
can be generated. Note that we will not have root access to install it; instead, we copy this library into our working directory ~/Desktop/dgemm
and rename it simply to libopenblas.so
. At the same time we have to make another copy with name libopenblas.so.0
, as this is the SONAME which run time loader will seek for:
~/Desktop/dgemm$ readelf -d libopenblas.so | grep "RPATH\|SONAME"
0x0000000e (SONAME) Library soname: [libopenblas.so.0]
Note that the RPATH
attribute is not given, which means this library is intended to be put in /usr/lib
and we should call ldconfig
to add it to ld.so.cache
. But again we don't have root access to do this. In fact, if this can be done, then all the difficulties are gone. We could then use update-alternatives --config libblas.so.3
to effectively link R to OpenBLAS.
Part 3: C code, Makefile, and R code
Here is a C script mmperf.c
computing GFLOPs of multiplying 2 square matrices of size N
:
#include <R.h>
#include <Rmath.h>
#include <Rinternals.h>
#include <R_ext/BLAS.h>
#include <sys/time.h>
/* standard C subroutine */
double mmperf (int n) {
/* local vars */
int n2 = n * n, tmp; double *A, *C, one = 1.0;
struct timeval t1, t2; double elapsedTime, GFLOPs;
/* simulate N-by-N matrix A */
A = (double *)calloc(n2, sizeof(double));
GetRNGstate();
tmp = 0; while (tmp < n2) {A[tmp] = runif(0.0, 1.0); tmp++;}
PutRNGstate();
/* generate N-by-N zero matrix C */
C = (double *)calloc(n2, sizeof(double));
/* time 'dgemm.f' for C <- A * A + C */
gettimeofday(&t1, NULL);
F77_CALL(dgemm) ("N", "N", &n, &n, &n, &one, A, &n, A, &n, &one, C, &n);
gettimeofday(&t2, NULL);
/* free memory */
free(A); free(C);
/* compute and return elapsedTime in microseconds (usec or 1e-6 sec) */
elapsedTime = (double)(t2.tv_sec - t1.tv_sec) * 1e+6;
elapsedTime += (double)(t2.tv_usec - t1.tv_usec);
/* convert microseconds to nanoseconds (1e-9 sec) */
elapsedTime *= 1e+3;
/* compute and return GFLOPs */
GFLOPs = 2.0 * (double)n2 * (double)n / elapsedTime;
return GFLOPs;
}
/* R wrapper */
SEXP R_mmperf (SEXP n) {
double GFLOPs = mmperf(asInteger(n));
return ScalarReal(GFLOPs);
}
Here is a simple R script mmperf.R
to report GFLOPs for case N = 2000
mmperf <- function (n) {
dyn.load("mmperf.so")
GFLOPs <- .Call("R_mmperf", n)
dyn.unload("mmperf.so")
return(GFLOPs)
}
GFLOPs <- round(mmperf(2000), 2)
cat(paste("GFLOPs =",GFLOPs, "\n"))
Finally there is a simple makefile to generate the shared library mmperf.so
:
mmperf.so: mmperf.o
gcc -shared -L$(shell pwd) -Wl,-rpath=$(shell pwd) -o mmperf.so mmperf.o -lopenblas
mmperf.o: mmperf.c
gcc -fpic -O2 -I$(shell Rscript --default-packages=base --vanilla -e 'cat(R.home("include"))') -c mmperf.c
Put all these files under working directory ~/Desktop/dgemm
, and compile it:
~/Desktop/dgemm$ make
~/Desktop/dgemm$ readelf -d mmperf.so | grep "NEEDED\|RPATH\|SONAME"
0x00000001 (NEEDED) Shared library: [libopenblas.so.0]
0x00000001 (NEEDED) Shared library: [libc.so.6]
0x0000000f (RPATH) Library rpath: [/home/zheyuan/Desktop/dgemm]
The output reassures us that OpenBLAS is correctly linked, and the run time load path is correctly set.
Part 4: testing OpenBLAS in R
Let's do
~/Desktop/dgemm$ Rscript --default-packages=base --vanilla mmperf.R
Note our script needs only the base
package in R, and --vanilla
is used to ignore all user settings on R start-up. On my laptop, my program returns:
GFLOPs = 1.11
Oops! This is truely reference BLAS performance not OpenBLAS (which is about 8-9 GFLOPs).
Part 5: Why?
To be honest, I don't know why this happens. Each step seems to work correctly. Does something subtle occurs when R is invoked? For example, any possibility that OpenBLAS library is overridden by reference BLAS at some point for some reason? Any explanations and solutions? Thanks!
*********************
Solution 1:
*********************
Thanks to Employed Russian, my problem is finally solved. The investigation requires important skills in Linux system debugging and patching, and I believe this is a great asset I learned. Here I would post a solution, as well as correcting several points in my original post.
1 About invoking R
In my original post, I mentioned there are two ways to launch R, either via
R
orRscript
. However, I have wrongly exaggerated their difference. Let's now investigate their start-up process, via an important Linux debugging facilitystrace
(seeman strace
). There are actually lots of interesting things happening after we type a command in the shell, and we can useto trace all system calls involving process management. As a result we can watch the fork, wait, and execution steps of a process. Though not stated in the manual page, @Employed Russian shows that it is possible to specify only a subclass of
process
, for example,execve
for the execution steps.For
R
we havewhile for
Rscript
we haveWe have also used
time
to measure the start-up time. Note thatRscript
is about 5.5 times faster thanR
. One reason is thatR
will load 6 default packages on start-up, whileRscript
only loads onebase
package by control:--default-packages=base
. But it is still much faster even without this setting.$(R RHOME)/bin/exec/R
, and in my original post, I have already exploitedreadelf -d
to show that this executable will loadlibR.so
, which are linked withlibblas.so.3
. According to @Employed Russian's explanation, the BLAS library loaded first will win, so there is no way my original method will work.strace
, we have used the amazing file/dev/null
as input file and output file when necessary. For example,Rscript
demands an input file, whileR
demands both. We feed the null device to both to make the command run smoothly and the output clean. The null device is a physically existing file, but works amazingly. When reading from it, it contains nothing; while writing to it, it discards everything.2. Cheat R
Now since
libblas.so
will be loaded anyway, the only thing we can do is to provide our own version of this library. As I have said in the original post, if we have root access, this is really easy, by usingupdate-alternatives --config libblas.so.3
, so that the system Linux will help us complete this switch. But @Employed Russian offers an awesome way to cheat the system without root access: let's check how R finds BLAS library on start-up, and make sure we feed our version before the system default is found! To monitor how shared libraries are found and loaded, use environment variableLD_DEBUG
.There are a number of Linux environment variables with prefix
LD_
, as documented inman ld.so
. These variables can be assigned before an executable, so that we can change the running feature of a program. Some useful variables include:LD_LIBRARY_PATH
for setting run time library search path;LD_DEBUG
for tracing look-up and loading of shared libraries;LD_TRACE_LOADED_OBJECTS
for displaying all loaded library by a program (behaves similar toldd
);LD_PRELOAD
for forcing injecting a library to a program at the very start, before all other libraries are looked for;LD_PROFILE
andLD_PROFILE_OUTPUT
for profiling one specified shared library. R user who have read section 3.4.1.1 sprof of Writing R extensions should recall that this is used for profiling compiled code from within R.The use of
LD_DEBUG
can be seen by:Here we are particularly interested in using
LD_DEBUG=libs
. For example,shows various attempts that R program tried to locate and load
libblas.so.3
. So if we could provide our own version oflibblas.so.3
, and make sure R finds it first, then the problem is solved.Let's first make a symbolic link
libblas.so.3
in our working path to the OpenBLAS librarylibopenblas.so
, then expand defaultLD_LIBRARY_PATH
with our working path (and export it):Now let's check again the library loading process:
Great! We have successfully cheated R.
3. Experiment with OpenBLAS
Now, everything works as expected!
4. Unset
LD_LIBRARY_PATH
(to be safe)It is a good practice to unset
LD_LIBRARY_PATH
after use.*********************
Solution 2:
*********************
Here we offer another solution, by exploiting environment variable
LD_PRELOAD
mentioned in our solution 1. The use ofLD_PRELOAD
is more "brutal", as it forces loading a given library into the program before any other program, even before the C librarylibc.so
! This is often used for urgent patching in Linux development.As shown in the part 2 of the original post, the shared BLAS library
libopenblas.so
has SONAMElibopenblas.so.0
. An SONAME is an internal name that dynamic library loader would seek at run time, so we need to make a symbolic link tolibopenblas.so
with this SONAME:then we export it:
Note that a full path to
libopenblas.so.0
needs be fed toLD_PRELOAD
for a successful load, even iflibopenblas.so.0
is under$(pwd)
.Now we launch
Rscript
and check what happens byLD_DEBUG
:Comparing with what we saw in solution 1 by cheating R with our own version of
libblas.so.3
, we can see thatlibopenblas.so.0
is loaded first, hence found first byRscript
;libopenblas.so.0
is found,Rscript
goes on searching and loadinglibblas.so.3
. However, this will play no effect by the "first come, first serve" rule, explained in the original answer.Good, everything works, so we test our
mmperf.c
program:The outcome 9.62 is bigger than 8.77 we saw in the earlier solution merely by chance. As a test for using OpenBLAS we don't run the experiment many times for preciser result.
Then as usual, we unset environment variable in the end:
First, shared libraries on UNIX are designed to mimic the way archive libraries work (archive libraries were there first). In particular that means that if you have
libfoo.so
andlibbar.so
, both defining symbolfoo
, then whichever library is loaded first is the one that wins: all references tofoo
from anywhere within the program (including fromlibbar.so
) will bind tolibfoo.so
s definition offoo
.This mimics what would happen if you linked your program against
libfoo.a
andlibbar.a
, where both archive libraries defined the same symbolfoo
. More info on archive linking here.It should be clear from above, that if
libblas.so.3
andlibopenblas.so.0
define the same set of symbols (which they do), and iflibblas.so.3
is loaded into the process first, then routines fromlibopenblas.so.0
will never be called.Second, you've correctly decided that since
R
directly links againstlibR.so
, and sincelibR.so
directly links againstlibblas.so.3
, it is guaranteed thatlibopenblas.so.0
will lose the battle.However, you erroneously decided that
Rscript
is better, but it's not:Rscript
is a tiny binary (11K on my system; compare to 2.4MB forlibR.so
), and approximately all it does isexec
ofR
. This is trivial to see instrace
output:Which means that by the time your script starts executing,
libblas.so.3
has been loaded, andlibopenblas.so.0
that will be loaded as a dependency ofmmperf.so
will not actually be used for anything.Probably. I can think of two possible solutions:
libopenblas.so.0
is actuallylibblas.so.3
R
package againstlibopenblas.so
.For #1, you need to
ln -s libopenblas.so.0 libblas.so.3
, then make sure that your copy oflibblas.so.3
is found before the system one, by settingLD_LIBRARY_PATH
appropriately.This appears to work for me:
Note how I got an error (my "pretend"
libblas.so.3
doesn't define symbols expected of it, since it's really a copy oflibc.so.6
).You can also confirm which version of
libblas.so.3
is getting loaded this way:For #2, you said:
but that seems to be a bogus argument: if you can build
libopenblas
, surely you can also build your own version ofR
.Update:
The symbols and the
SONAME
have nothing to do with each other.You can see symbols in the output from
readelf -Ws libblas.so.3
andreadelf -Ws libopenblas.so.0
. Symbols related toBLAS
, such ascgemv_
, will appear in both libraries.Your confusion about
SONAME
possibly comes from Windows. TheDLL
s on Windows are designed completely differently. In particular, whenFOO.DLL
imports symbolbar
fromBAR.DLL
, both the name of the symbol (bar
) and theDLL
from which that symbol was imported (BAR.DLL
) are recorded in theFOO.DLL
s import table.That makes it easy to have
R
importcgemv_
fromBLAS.DLL
, whileMMPERF.DLL
imports the same symbol fromOPENBLAS.DLL
.However, that makes library interpositioning hard, and works completely differently from the way archive libraries work (even on Windows).
Opinions differ on which design is better overall, but neither system is likely to ever change its model.
There are ways for UNIX to emulate Windows-style symbol binding: see
RTLD_DEEPBIND
in dlopen man page. Beware: these are fraught with peril, likely to confuse UNIX experts, are not widely used, and likely to have implementation bugs.Update 2:
Yes.
Either way works.