I'm very new to multi-threading. I have 2 functions in my python script. One function enqueue_tasks
iterates through a large list of small items and performs a task on each item which involves appending an item to a list (lets call it master_list
). This I already have multi-threaded using futures.
executor = concurrent.futures.ThreadPoolExecutor(15) # Arbitrarily 15
futures = [executor.submit(enqueue_tasks, group) for group in grouper(key_list, 50)]
concurrent.futures.wait(futures)
I have another function process_master
that iterates through the master_list
above and checks the status of each item in the list, then does some operation.
Can I use the same method above to use multi-threading for process_master
? Furthermore, can I have it running at the same time as enqueue_tasks
? What are the implications of this? process_master
is dependent on the list from enqueue_tasks
, so will running them at the same time be a problem? Is there a way I can delay the second function a bit? (using time.sleep
perhaps)?
No, this isn't safe. If
enqueue_tasks
andprocess_master
are running at the same time, you could potentially be adding items tomaster_list
insideenqueue_tasks
at the same timeprocess_master
is iterating over it. Changing the size of an iterable while you iterate over it causes undefined behavior in Python, and should always be avoided. You should use athreading.Lock
to protect the code that adds items tomaster_list
, as well as the code that iterates overmaster_list
, to ensure they never run at the same time.Better yet, use a
Queue.Queue
(queue.Queue
in Python 3.x) instead of alist
, which is a thread-safe data structure. Add items to theQueue
inenqueue_tasks
, andget
items from theQueue
inprocess_master
. That wayprocess_master
can safely run a the same time asenqueue_tasks
.