-->

Update variable while working with ProcessPoolExec

2020-03-01 02:25发布

问题:

if __name__ == '__main__':

    MATCH_ID = str(doc_ref2.id)

    MATCH_ID_TEAM = doc_ref3.id

    with concurrent.futures.ProcessPoolExecutor(max_workers=30) as executor:
        results = list(executor.map(ESPNPlayerFree, teamList1))

    MATCH_ID_TEAM = str(doc_ref4.id)

    with concurrent.futures.ProcessPoolExecutor(max_workers=30) as executor:
        results = list(executor.map(ESPNPlayerFree, teamList2))

When I print the MATCH_ID_TEAM it prints the value. But in the process, it shows up the default value which I set empty at the top.

How do I update the value of my variables to all the processes?

ESPNPlayerFree is a class that takes `id` as an argument. So `teamList1` and `teamList2` are list of ids to initialize my objects.

MATCH_ID and MATCH_ID_TEAM are variables that are used in my Class ESPNPlayerFree

OS Windows 10 64bit

IDE Pycharm

Python Version 3.6.1

回答1:

I'm picking up where @furas left in his comment some days ago. The simplest approach is indeed to pass just everything you need in your class along with .map(). executor.map() is expecting iterables, which get zipped to an argument-tuple for each function call to be made in your workers.

You obviously need both MATCH_ID and MATCH_ID_TEAM to remain the same for a whole job, that is one call to executor.map(). Your challenge is that both are iterables (strings), but you need them replicated as a whole and often enough to match with every item of your teamlist-iterable.

So what you do is simply wrap these strings with itertools.repeat() when you pass them to .map() together with the team-id list. itertools.repeat() by default returns an infinite iterator of the passed object. ProcessPoolExecutor internally then uses zip() to combine items from all iterables as arguments.

import concurrent.futures
import multiprocessing
from itertools import repeat


class ESPNPlayerFree:
    def __init__(self, team_id, match_id, match_id_team):
        self.teams_id = team_id
        self.match_id = match_id
        self.match_id_team = match_id_team
        print(
            multiprocessing.current_process().name,
            self.teams_id, self.match_id, self.match_id_team
        )


if __name__ == '__main__':

    teams1 = [f"id{i}" for i in range (10)]
    teams2 = [f"id{i}" for i in range(10, 20)]

    with concurrent.futures.ProcessPoolExecutor(max_workers=4) as executor:

        MATCH_ID = 'doc_ref2.id'
        MATCH_ID_TEAM = 'doc_ref3.id'

        results = list(
            executor.map(
                ESPNPlayerFree,
                teams1,
                repeat(MATCH_ID),
                repeat(MATCH_ID_TEAM),
            )
        )

        print("--- new MATCH_ID_TEAM ---")
        MATCH_ID_TEAM = 'doc_ref4.id'

        results = list(
            executor.map(
                ESPNPlayerFree,
                teams2,
                repeat(MATCH_ID),
                repeat(MATCH_ID_TEAM),
            )
        )

Output:

ForkProcess-1 id0 doc_ref2.id doc_ref3.id
ForkProcess-2 id1 doc_ref2.id doc_ref3.id
ForkProcess-3 id2 doc_ref2.id doc_ref3.id
ForkProcess-4 id3 doc_ref2.id doc_ref3.id
ForkProcess-1 id4 doc_ref2.id doc_ref3.id
ForkProcess-3 id5 doc_ref2.id doc_ref3.id
ForkProcess-2 id6 doc_ref2.id doc_ref3.id
ForkProcess-4 id7 doc_ref2.id doc_ref3.id
ForkProcess-3 id8 doc_ref2.id doc_ref3.id
ForkProcess-1 id9 doc_ref2.id doc_ref3.id
--- new MATCH_ID_TEAM ---
ForkProcess-1 id10 doc_ref2.id doc_ref4.id
ForkProcess-3 id11 doc_ref2.id doc_ref4.id
ForkProcess-2 id12 doc_ref2.id doc_ref4.id
ForkProcess-4 id13 doc_ref2.id doc_ref4.id
ForkProcess-1 id14 doc_ref2.id doc_ref4.id
ForkProcess-3 id15 doc_ref2.id doc_ref4.id
ForkProcess-2 id16 doc_ref2.id doc_ref4.id
ForkProcess-4 id17 doc_ref2.id doc_ref4.id
ForkProcess-2 id18 doc_ref2.id doc_ref4.id
ForkProcess-1 id19 doc_ref2.id doc_ref4.id

Process finished with exit code 0

For the second job, with new MATCH_ID_TEAM you then don't have to recreate the ProcessPoolExecutor again, you just use the existing again by staying within the context-manager as long as you need it.