Numpy SVD appears to parallelize on Mac OSX, but n

I want to run the following script:

#python imports
import time

#3rd party imports
import numpy as np
import pandas as pd

def pd_svd(pd_dataframe):
    np_dataframe = pd_dataframe.values
    return np.linalg.svd(pd_dataframe)

if __name__ == '__main__':
    li_times = []
    for i in range(1, 3):
        start = time.time()
        pd_dataframe = pd.DataFrame(np.random.random((3000, 252 * i)))
        pd_svd(pd_dataframe)
        li_times.append(str(time.time() - start))
    print li_times

I try it on my Macbook Air 2011 with OSX 10.9.4 and on a 16 core cloud VM running Ubuntu 12.0.4. For some reason, this takes approximately 4 seconds on my Macbook Air and about 15 seconds on my VM. I inspected the processes using top, and it appeared that on my Ubuntu VM, it was not using parallelism, while on my Macbook Air, it was.

Below is the result of top on my MBA:

And here on my ubuntu VM:

Any ideas why my Macbook Air is so much faster for SVD? In particular, when doing numpy comparisons, the cloud VM was MUCH faster and seemed to be using parallelism (didn't do top, but it was several times as fast).

Edit:

Here is the output of np.show_config() on the cloud VM:

blas_info:
    libraries = ['blas']
    library_dirs = ['/usr/lib']
    language = f77
lapack_info:
    libraries = ['lapack']
    library_dirs = ['/usr/lib']
    language = f77
atlas_threads_info:
  NOT AVAILABLE
blas_opt_info:
    libraries = ['blas']
    library_dirs = ['/usr/lib']
    language = f77
    define_macros = [('NO_ATLAS_INFO', 1)]
atlas_blas_threads_info:
  NOT AVAILABLE
lapack_opt_info:
    libraries = ['lapack', 'blas']
    library_dirs = ['/usr/lib']
    language = f77
    define_macros = [('NO_ATLAS_INFO', 1)]
atlas_info:
  NOT AVAILABLE
lapack_mkl_info:
  NOT AVAILABLE
blas_mkl_info:
  NOT AVAILABLE
atlas_blas_info:
  NOT AVAILABLE
mkl_info:
  NOT AVAILABLE

I suspect that the version of numpy on your cloud VM is only linked to the reference CBLAS library (*/usr/lib/libblas/libblas.so.3.0). This is single-threaded and much slower than other optimized BLAS implementations such as OpenBLAS and ATLAS.

You can confirm this by using ldd to check which libraries are dynamically linked by numpy at runtime:

~$ ldd /usr/lib/python2.7/dist-packages/numpy/core/_dotblas.so

You will probably see a line like this:

...
libblas.so.3 => /usr/lib/libblas.so.3 (0x00007f98445e3000)
...

/usr/lib/libblas.so.3 is a symbolic link. If you follow the chain of links using readlink, you'll probably see something like this:

~$ readlink -f /usr/lib/libblas.so.3
/usr/lib/libblas/libblas.so.3.0

This is the slow, single-threaded CBLAS library. Assuming you have root access, the easiest solution is probably to install OpenBLAS via apt-get:

~$ sudo apt-get install libopenblas-base libopenblas-dev

When I installed this package on my server, it updated the symlink at /usr/lib/libblas.so.3 to point at the OpenBLAS library rather than CBLAS:

~$ readlink -f /usr/lib/libblas.so.3
/usr/lib/openblas-base/libblas.so.3

Hopefully that should be enough to get you going with a faster BLAS library.

If, for whatever reason, you can't solve this using apt-get, I've previously written some instructions for building numpy and OpenBLAS from source which you can find here. I've also written some instructions here for manually symlinking to a different BLAS library using update-alternatives.

*The paths I refer to in my answer are the defaults for a server running Ubuntu 14.10, where I have installed numpy using apt-get. They might differ a bit depending on your version of Ubuntu and the way in which you've installed numpy.