python multiprocessing. Pool got stuck after long

I am developing a tool that analyzes huge files. In order to do that faster I introduced multiprocessing on it and everything seems to work fine.In order to do it I am using multiprocessing.pool creating N threads, and they handle different chunks of work I previously created.

pool = Pool(processes=params.nthreads)
for chunk in chunk_list:
    pool.apply_async(__parallel_quant, [filelist, chunk, outfilename])

pool.close()
pool.join()

As you can see, this is standard pool execution, with no special usage.

Lately I find a problem when I am running a really big amount of data. The standard executions take around 2 hours with 16 threads, but I have an special case that takes around 8 hours, due to its really big amount of files and size of them.

The problem is that lately I found that when I am executing this case, the execution runs fine until the finish, most of the childs finish properly except for one that got stucked on

<built-in method recv of _multiprocessing.Connection object at remote 0x3698db0>

Since this child is not finishing parent doesn't wake up and the execution stops.

This situation only happens when the input files are very big, so I was wondering if there is any kind of default timeout that can cause this problem.

I am using python 2.7 multiprocessing 0.70a1

and my machine is a centos 7 (32 cores, 64GB RAM)

Thanks in advance for your help

Jordi

From the multiprocessing Programming guidelines:

Avoid shared state

As far as possible one should try to avoid shifting large amounts of data between processes.

If you have to split file processing through several processes, it is better to instruct them on how to retrieve the file chunks rather than sending the chunks themselves.

Try to pass the chunk offset and the chunk size to the child process. It can retrieve the chunk from the file with open() and seek(). You will notice a performance improvement and a reduction of the memory footprint as well.