It is unclear how to properly timeout workers of joblib's Parallel
in python. Others have had similar questions here, here, here and here.
In my example I am utilizing a pool of 50 joblib
workers with threading
backend.
Parallel Call (threading):
output = Parallel(n_jobs=50, backend = 'threading')
(delayed(get_output)(INPUT)
for INPUT in list)
Here, Parallel
hangs without errors as soon as len(list) <= n_jobs
but only when n_jobs => -1
.
In order to circumvent this issue, people give instructions on how to create a timeout decorator to the Parallel
function (get_output(INPUT)
) in the above example) using multiprocessing
:
Main function (decorated):
@with_timeout(10) # multiprocessing
def get_output(INPUT): # threading
output = do_stuff(INPUT)
return output
Multiprocessing Decorator:
def with_timeout(timeout):
def decorator(decorated):
@functools.wraps(decorated)
def inner(*args, **kwargs):
pool = multiprocessing.pool.ThreadPool(1)
async_result = pool.apply_async(decorated, args, kwargs)
try:
return async_result.get(timeout)
except multiprocessing.TimeoutError:
return
return inner
return decorator
Adding the decorator to the otherwise working code results in a memory leak after ~2x the length of the timeout plus a crash of eclipse.
Where is this leak in the decorator?
How to timeout threads during multiprocessing in python?