I am developing a tool that analyzes huge files. In order to do that faster I introduced multiprocessing on it and everything seems to work fine.In order to do it I am using multiprocessing.pool creating N threads, and they handle different chunks of work I previously created.
pool = Pool(processes=params.nthreads)
for chunk in chunk_list:
pool.apply_async(__parallel_quant, [filelist, chunk, outfilename])
pool.close()
pool.join()
As you can see, this is standard pool execution, with no special usage.
Lately I find a problem when I am running a really big amount of data. The standard executions take around 2 hours with 16 threads, but I have an special case that takes around 8 hours, due to its really big amount of files and size of them.
The problem is that lately I found that when I am executing this case, the execution runs fine until the finish, most of the childs finish properly except for one that got stucked on
<built-in method recv of _multiprocessing.Connection object at remote 0x3698db0>
Since this child is not finishing parent doesn't wake up and the execution stops.
This situation only happens when the input files are very big, so I was wondering if there is any kind of default timeout that can cause this problem.
I am using python 2.7 multiprocessing 0.70a1
and my machine is a centos 7 (32 cores, 64GB RAM)
Thanks in advance for your help
Jordi