Process communication of Python's Multiprocess

2020-06-03 07:33发布

问题:

I've learned about Python multiprocess's Pipes/Queues/Shared ctypes Objects/Managers, and I want to compare them with Linux's anonymous pipes, named pipes, shared memory, socket, and so on. I now have the following questions

  • The pipes and queue modules of Python's multiprocessing are based on anonymous pipes. Does it provide named pipes?

  • Does Python multiprocessing.sharedctypes support independent process communication? I think it only supports father and child process or brotherly process communication.

  • Which of them are only used in the process of paternity or brotherhood, which can be communicated between independent processes or different hosts?

  • What are their respective characteristics, how should I choose them?

Thanks in advance.

回答1:

Your question is quite broad and most of the answers can be found in the multiprocessing module documentation.

Here follows a somewhat short answer.

  1. The multiprocessing Listeners and Clients allow to choose named pipes as transport medium.
  2. From the documentation:

    The multiprocessing.sharedctypes module provides functions for allocating ctypes objects from shared memory which can be inherited by child processes.

    You cannot use multiprocessing.sharedctypes functionalities across processes which don't have parent/child relationship.

  3. Managers and Listeners and Clients work across processes on different hosts or which do not have parent/child relationship. The AF_INET socket family can be used across different hosts. Nevertheless I'd recommend against it. Rather use network sockets or some other abstraction mechanism.
  4. Differences and characteristics are well illustrated in the documentation.

Python multiprocessing module was initially implemented over the threading APIs. By the time, it grew in features it supports but the core idea remains the same. The multiprocessing module is intended to deal with Python process families. For any other use, the subprocess module is a better option.

For distribution of tasks and jobs across multiple hosts, there are far better solutions abstracting the low level infrastructure. You can take a look at Python projects such as Celery or Luigi or more complex infrastructures such as Apache Mesos.