的Python:并行执行猫子的Python:并行执行猫子(Python: execute cat s

2019-05-09 04:52发布

我跑几个cat | zgrep cat | zgrep在远程服务器上的命令和单独地收集它们的输出进行进一步的处理:

class MainProcessor(mp.Process):
    def __init__(self, peaks_array):
        super(MainProcessor, self).__init__()
        self.peaks_array = peaks_array

    def run(self):
        for peak_arr in self.peaks_array:
            peak_processor = PeakProcessor(peak_arr)
            peak_processor.start()

class PeakProcessor(mp.Process):
    def __init__(self, peak_arr):
        super(PeakProcessor, self).__init__()
        self.peak_arr = peak_arr

    def run(self):
        command = 'ssh remote_host cat files_to_process | zgrep --mmap "regex" '
        log_lines = (subprocess.check_output(command, shell=True)).split('\n')
        process_data(log_lines)

然而,这导致子进程的顺序执行(“SSH ......猫......”)的命令。 第二峰等待完成第一等等。

我怎么能修改此代码,以便调用子进程并行运行,同时仍然能够收集输出为每个单独?

Answer 1:

另一种方法(而不是把壳过程中的背景的其他建议)是使用多线程。

run ,你有那么会做这样的事情的方法:

thread.start_new_thread ( myFuncThatDoesZGrep)

为了收集结果,你可以这样做:

class MyThread(threading.Thread):
   def run(self):
       self.finished = False
       # Your code to run the command here.
       blahBlah()
       # When finished....
       self.finished = True
       self.results = []

运行在多线程上的链接上述的线程。 当你的线程对象myThread.finished ==真,那么你就可以收集通过myThread.results结果。



Answer 2:

你不需要既不multiprocessing也不threading并行运行如子流程:

#!/usr/bin/env python
from subprocess import Popen

# run commands in parallel
processes = [Popen("echo {i:d}; sleep 2; echo {i:d}".format(i=i), shell=True)
             for i in range(5)]
# collect statuses
exitcodes = [p.wait() for p in processes]

它同时运行5个外壳命令。 注意:无论线程也不multiprocessing模块用在这里。 没有一点加符号&到shell命令: Popen不会等待命令完成。 你需要调用.wait()明确。

这是方便,但它是没有必要使用线程从子过程收集的输出:

#!/usr/bin/env python
from multiprocessing.dummy import Pool # thread pool
from subprocess import Popen, PIPE, STDOUT

# run commands in parallel
processes = [Popen("echo {i:d}; sleep 2; echo {i:d}".format(i=i), shell=True,
                   stdin=PIPE, stdout=PIPE, stderr=STDOUT, close_fds=True)
             for i in range(5)]

# collect output in parallel
def get_lines(process):
    return process.communicate()[0].splitlines()

outputs = Pool(len(processes)).map(get_lines, processes)

相关阅读: Python的多线程子进程的bash? 。

下面是会从若干子过程同时在同一个线程输出的代码示例:

#!/usr/bin/env python3
import asyncio
import sys
from asyncio.subprocess import PIPE, STDOUT

@asyncio.coroutine
def get_lines(shell_command):
    p = yield from asyncio.create_subprocess_shell(shell_command,
            stdin=PIPE, stdout=PIPE, stderr=STDOUT)
    return (yield from p.communicate())[0].splitlines()

if sys.platform.startswith('win'):
    loop = asyncio.ProactorEventLoop() # for subprocess' pipes on Windows
    asyncio.set_event_loop(loop)
else:
    loop = asyncio.get_event_loop()

# get commands output in parallel
coros = [get_lines('"{e}" -c "print({i:d}); import time; time.sleep({i:d})"'
                    .format(i=i, e=sys.executable)) for i in range(5)]
print(loop.run_until_complete(asyncio.gather(*coros)))
loop.close()


文章来源: Python: execute cat subprocess in parallel