Parallel Python: How do I supply arguments to '

2019-03-28 20:19发布

问题:

This is only the second question with the parallel-python tag. After looking through the documentation and googling for the subject, I've come here as it's where I've had the best luck with answers and suggestions.

The following is the API (I think it's called) that submits all pertinent info to pp.

    def submit(self, func, args=(), depfuncs=(), modules=(),
        callback=None, callbackargs=(), group='default', globals=None):
    """Submits function to the execution queue

        func - function to be executed
        args - tuple with arguments of the 'func'
        depfuncs - tuple with functions which might be called from 'func'
        modules - tuple with module names to import
        callback - callback function which will be called with argument
                list equal to callbackargs+(result,)
                as soon as calculation is done
        callbackargs - additional arguments for callback function
        group - job group, is used when wait(group) is called to wait for
        jobs in a given group to finish
        globals - dictionary from which all modules, functions and classes
        will be imported, for instance: globals=globals()
    """

Here is my submit statement with its arguments:

job_server.submit(reify, (pop1, pop2, 1000), 
                  depfuncs = (key_seq, Chromosome, Params, Node, Tree), 
                  modules = ("math",), 
                  callback = sum.add, globals = globals())

All the capitalized names in depfuncs are the names of classes. I wasn't sure where to put the classes or even if I would need to include them as they are in the globals() dictionary. But when I ran it with the depfuncs() empty, it would raise an error such as "Tree not defined" (for example).

Now, key_seq is a generator, so I have to work with an instance of it in order to be able to use .next():

def key_seq():
    a = 0
    while True:
        yield a
        a = a + 1
ks = key_seq()

ks is defined in globals(). When I didn't include it anywhere else, I got an error saying 'ks is not defined'. When I include ks in depfuncs, this is the error:

Traceback (most recent call last):
  File "C:\Python26\Code\gppp.py", line 459, in <module>
    job_server.submit(reify, (pop1, pop2, 1000), depfuncs = (key_seq, ks, Chromosome, Params, Node, Tree), modules = ("math",), callback = sum.add, globals = globals())
  File "C:\Python26\lib\site-packages\pp.py", line 449, in submit
    sfunc = self.__dumpsfunc((func, ) + depfuncs, modules)
  File "C:\Python26\lib\site-packages\pp.py", line 634, in __dumpsfunc
    sources = [self.__get_source(func) for func in funcs]
  File "C:\Python26\lib\site-packages\pp.py", line 713, in __get_source
    sourcelines = inspect.getsourcelines(func)[0]
  File "C:\Python26\lib\inspect.py", line 678, in getsourcelines
    lines, lnum = findsource(object)
  File "C:\Python26\lib\inspect.py", line 519, in findsource
    file = getsourcefile(object) or getfile(object)
  File "C:\Python26\lib\inspect.py", line 441, in getsourcefile
    filename = getfile(object)
  File "C:\Python26\lib\inspect.py", line 418, in getfile
    raise TypeError('arg is not a module, class, method, '
TypeError: arg is not a module, class, method, function, traceback, frame, or code object

I'm pretty sure arg is referring to ks. So, where do I tell .submit() about ks? I don't understand what's supposed to go where. Thanks.

回答1:

interesting - are you doing genetics simulations? i ask because i see 'Chromosome' in there, and I once developed a population genetics simulation using parallel python.

your approach looks really complicated. in my parallel python program, i used the following call:

job = jobServer.submit( doRun, (param,))

how did i get away with this? the trick is that the doRun function doesn't run in the same context as the context in which you call sumbit. For instance (contrived example):

import os, pp

def doRun(param):
    print "your name is %s!" % os.getlogin()

jobServer = pp.Server()
jobServer.submit( doRun, (param,))

this code will fail. this is because the os module doesn't exist in doRun - doRun is not running in the same context as submit. sure, you can pass os in the module parameter of submit, but isn't it easier just to call import os in doRun ?

parallel python tries to avoid python's GIL by running your function in a totally separate process. it tries to make this easier to swallow by letting you quote-"pass" parameters and namespaces to your function, but it does this using hacks. for instance, your classes will be serialized using some variant of pickle and then unserialized in the new process.

But instead of relying on submit's hacks, just accept the reality that your function is going to need to do all the work of setting up it's run context. you really have two main functions - one that sets up the call to submit, and one, which you call via submit, which actually sets up the work you need to do.

if you need the next value from your generator to be available for a pp run, also pass it as a parameter! this avoids lambda functions and generator references, and leaves you with passing a simple variable!

my code is not maintained anymore, but if you're curious check it out here: http://pps-spud.uchicago.edu/viewvc/fps/trunk/python/fps.py?view=markup



回答2:

I think you should be passing in lambda:ks.next() instead of plain old ks