How can I inherit parent logger when using Python&

2020-05-08 01:02发布

问题:

I'm using Python's multiprocessing. I have set the logger in parent process, but I can't just simply inherit the parent's logging setting.

I don't worry about mixing up the log, for I use multiprocessing not for running jobs concurrently, but for time controlling, so only one sub process is running at same time.

My code without multiprocessing:

from multiprocessing import Process
import paramiko
import logging
import sys


def sftp_read():
    # log.debug("Child process started")  # This line will cause exception if it is run in sub process.
    ssh = paramiko.SSHClient()
    ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
    timeout = 60
    ssh.connect('my_server', username='my_user', password='my_password', timeout=timeout, auth_timeout=timeout,
                banner_timeout=timeout)
    sftp = ssh.open_sftp()
    fp = sftp.file('/home/my_user/my_file.txt')
    lines = fp.readlines()
    print ''.join(lines)
    fp.close()
    ssh.close()


def main():
    sftp_read()  # Call this function without multiprocessing

if __name__ == '__main__':
    logging.basicConfig(stream=sys.stdout,
                        format='[%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s - %(message)s')
    log = logging.getLogger()
    log.setLevel(logging.DEBUG)
    main()

The above code work properly, paramiko prints log normally like below:

[2018-11-20 10:38:45,051] {transport.py:1746} DEBUG - starting thread (client mode): 0x3052208L
[2018-11-20 10:38:45,051] {transport.py:1746} DEBUG - Local version/idstring: SSH-2.0-paramiko_2.4.2
[2018-11-20 10:38:45,405] {transport.py:1746} DEBUG - Remote version/idstring: SSH-2.0-OpenSSH_7.2p2 Ubuntu-4ubuntu2.6
[2018-11-20 10:38:45,405] {transport.py:1746} INFO - Connected (version 2.0, client OpenSSH_7.2p2)

But when I change the main function into the following code to control the time (limit the max running time of the SFTP reading to 15 seconds):

def main():
    # Use multiprocessing to limit the running time to at most 15 seconds.
    p = Process(target=sftp_read)
    try:
        log.debug("About to start SSH")
        p.start()
        log.debug('Process started')
        p.join(15)
    finally:
        if p.is_alive():
            p.terminate()
            log.debug('Terminated')
        else:
            log.debug("Finished normally")

Paramiko no longer prints log. Now I want to set the logging config to as same as the parent one, how can I do it?

I don't want an answer telling me to get a logger again, because in my production server there is a global logging setting and might be changed every now and then, I can't configure my own logging setting which is not controlled by the global setting.

So I wonder if there is a way allowing me to configure my sub process's logging setting as the parent one.

回答1:

In Python, subprocesses are started on POSIX standard. SubProcesses in POSIX standard are created using fork system call. The child process created using fork is essentially a copy of everything in the parent process' memory. In your case, child process will have access to logger from parent.

Warning: fork copies everything; but, does not copy threads. Any threads running in parent process do not exist in child process.

import logging
from multiprocessing import Pool
from os import getpid

def runs_in_subprocess():
    logging.info(
        "I am the child, with PID {}".format(getpid()))

if __name__ == '__main__':
    logging.basicConfig(
        format='GADZOOKS %(message)s', level=logging.DEBUG)

    logging.info(
        "I am the parent, with PID {}".format(getpid()))

    with Pool() as pool:
        pool.apply(runs_in_subprocess)

Output:

GADZOOKS I am the parent, with PID 3884
GADZOOKS I am the child, with PID 3885

Notice how child processes in your pool inherit the parent process’ logging configuration

You might get into problem of deadlocks so beware of following:

  1. Whenever the thread in the parent process writes a log messages, it adds it to a Queue. That involves acquiring a lock.

  2. If the fork() happens at the wrong time, the lock is copied in an acquired state.

  3. The child process copies the parent’s logging configuration—including the queue. Whenever the child process writes a log message, it tries to write it to the queue.

  4. That means acquiring the lock, but the lock is already acquired.

  5. The child process now waits for the lock to be released.

  6. The lock will never be released, because the thread that would release it wasn’t copied over by the fork().

In python3, you could avoid this using get_context.

from multiprocessing import get_context

def your_func():
    with get_context("spawn").Pool() as pool:
        # ... everything else is unchanged

Advice:

  1. Using get_context create a new Pool and use process' within this pool to do the job for you.

  2. Every process from the pool will have access to the parent process' log config.