I want to run the following script:
#python imports
import time
#3rd party imports
import numpy as np
import pandas as pd
def pd_svd(pd_dataframe):
np_dataframe = pd_dataframe.values
return np.linalg.svd(pd_dataframe)
if __name__ == '__main__':
li_times = []
for i in range(1, 3):
start = time.time()
pd_dataframe = pd.DataFrame(np.random.random((3000, 252 * i)))
pd_svd(pd_dataframe)
li_times.append(str(time.time() - start))
print li_times
I try it on my Macbook Air 2011 with OSX 10.9.4 and on a 16 core cloud VM running Ubuntu 12.0.4. For some reason, this takes approximately 4 seconds on my Macbook Air and about 15 seconds on my VM. I inspected the processes using top
, and it appeared that on my Ubuntu VM, it was not using parallelism, while on my Macbook Air, it was.
Below is the result of top on my MBA:
And here on my ubuntu VM:
Any ideas why my Macbook Air is so much faster for SVD? In particular, when doing numpy comparisons, the cloud VM was MUCH faster and seemed to be using parallelism (didn't do top
, but it was several times as fast).
Edit:
Here is the output of np.show_config()
on the cloud VM:
blas_info:
libraries = ['blas']
library_dirs = ['/usr/lib']
language = f77
lapack_info:
libraries = ['lapack']
library_dirs = ['/usr/lib']
language = f77
atlas_threads_info:
NOT AVAILABLE
blas_opt_info:
libraries = ['blas']
library_dirs = ['/usr/lib']
language = f77
define_macros = [('NO_ATLAS_INFO', 1)]
atlas_blas_threads_info:
NOT AVAILABLE
lapack_opt_info:
libraries = ['lapack', 'blas']
library_dirs = ['/usr/lib']
language = f77
define_macros = [('NO_ATLAS_INFO', 1)]
atlas_info:
NOT AVAILABLE
lapack_mkl_info:
NOT AVAILABLE
blas_mkl_info:
NOT AVAILABLE
atlas_blas_info:
NOT AVAILABLE
mkl_info:
NOT AVAILABLE