multiprocess plotting in matplotlib

2020-04-18 05:15发布

问题:

How can one visualize data using matplotlib by a function in parallel? I.e. I want to create figures in parallel processes and then display them in the main process.

Here is an example:

# input data
import pandas as pd, matplotlib.pyplot as plt
df = pd.DataFrame(data={'i':['A','A','B','B'],
                       'x':[1.,2.,3.,4.],
                       'y':[1.,2.,3.,4.]})
df.set_index('i', inplace=True)
df.sort_index(inplace=True)

# function which creates a figure from the data
def Draw(df, i):
    fig = plt.figure(i)
    ax = fig.gca()
    df = df.loc[i,:]
    ax.scatter(df['x'], df['y'])
    return fig

def DrawWrapper(x): return Draw(*x)

# creating figures in parallel
from multiprocessing import Pool
poolSize = 2
with Pool(poolSize) as p:
    args = [(df,'A'), (df,'B')]
    figs = p.map(DrawWrapper, args)

# attempt to visualize the results
fig = plt.figure('A')
plt.show()
# FIXME: get "RuntimeError: main thread is not in main loop"

How do I transfer the figure objects from the worker processes such as to be able to show the figures in the main process?

Thank you for your help!

[EDIT:] It was suggested that the problem might be solved by this thread

Here is the corresponding code:

# input data
import pandas as pd, matplotlib.pyplot as plt
df = pd.DataFrame(data={'i':['A','A','B','B'],
                       'x':[1.,2.,3.,4.],
                       'y':[1.,2.,3.,4.]})
df.set_index('i', inplace=True)
df.sort_index(inplace=True)

# function which creates a figure from the data
def Draw(df, i):
    fig = plt.figure(i)
    ax = fig.gca()
    df = df.loc[i,:]
    ax.scatter(df['x'], df['y'])
    plt.show()

# creating figures in parallel
from multiprocessing import Process
args = [(df,'A'), (df,'B')]
for a in args:
    p = Process(target=Draw, args=a)
    p.start()

# FIXME: result is the same (might be even worse since I do not 
# get any result which I could attempt to show):
# ...
# RuntimeError: main thread is not in main loop
# RuntimeError: main thread is not in main loop

Am I missing something?

回答1:

The linked question's answer hides the start of the code in a if __name__ == "__main__": clause. Hence the following should work here.

import pandas as pd
import matplotlib.pyplot as plt

import multiprocessing
#multiprocessing.freeze_support() # <- may be required on windows

df = pd.DataFrame(data={'i':['A','A','B','B'],
                       'x':[1.,2.,3.,4.],
                       'y':[1.,2.,3.,4.]})
df.set_index('i', inplace=True)
df.sort_index(inplace=True)

# function which creates a figure from the data
def Draw(df, i):
    fig, ax  = plt.subplots()
    df = df.loc[i,:]
    ax.scatter(df['x'], df['y'])
    plt.show()

# creating figures in parallel
args = [(df,'A'), (df,'B')]

def multiP():
    for a in args:
        p = multiprocessing.Process(target=Draw, args=a)
        p.start()

if __name__ == "__main__":         
    multiP()