Optimization: alternatives to passing large array

2019-08-03 02:56发布

问题:

I originally wrote a nested for loop over a test 3D array in python. As I wanted to apply it to larger array which would take a lot more time, I decided to parallelise using ipyparallel by writing it as a function and using bview.map. This way I could take advantage of multiple cores/nodes on a supercomputer.

However the code is actually slower when sent to the supercomputer. When I profiled, it appears the time is spent mostly on

method 'acquire' of 'thread.lock' objects

which from other stack exchange threads suggests that is due to shared data causing slowdown due to synchronization.

I tried using map instead of map_sync, but time.sleep takes up around the same amount of time in that case.

What is the correct way to be using map or an alternative?

Code snippet where the issue appears to be:

    SSIMarray = numpy.zeros((imx,imy,imz))             
    cenx, ceny, cenz= zip(*itertools.product(range(0,imx), range(0,imy), range(0,imz)))
    amr= bview.map_sync(SSIMfunc, cenx, ceny, cenz)
    SSIMarray = (numpy.asarray(amr))

and the profiler result

 60823909 function calls (60593868 primitive calls) in 201.869 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1089003  113.937    0.000  113.937    0.000 {method 'acquire' of 'thread.lock' objects}
    64080    5.223    0.000    6.873    0.000 uuid.py:579(uuid4)
   384352    4.933    0.000    5.145    0.000 {cPickle.dumps}
   640560    4.526    0.000    6.064    0.000 threading.py:260(__init__)
    64019    3.704    0.000   16.941    0.000 asyncresult.py:95(_init_futures)
   640560    3.338    0.000    9.402    0.000 threading.py:242(Condition)
    64077    3.222    0.000   31.562    0.000 client.py:935(_send)
   320327    2.359    0.000    8.756    0.000 _base.py:287(__init__)