I'm making a program for running simulations in Python, with a wxPython interface. In the program, you can create a simulation, and the program renders (=calculates) it for you. Rendering can be very time-consuming sometimes.
When the user starts a simulation, and defines an initial state, I want the program to render the simulation continuously in the background, while the user may be doing different things in the program. Sort of like a YouTube-style bar that fills up: You can play the simulation only up to the point that was rendered.
Should I use multiple processes or multiple threads or what? People told me to use the multiprocessing
package, I checked it out and it looks good, but I also heard that processes, unlike threads, can't share a lot of information (and I think my program will need to share a lot of information.) Additionally I also heard about Stackless Python: Is it a separate option? I have no idea.
Please advise.
A process has its own memory space. It makes it more difficult to share information, but also makes the program safer (less need for explicit synchronization). That being said, processes can share the same memory in read-only mode.
A thread is cheaper to create or kill, but the main difference is that it shares memory with other threads in the same process. This is sometimes risky, and in addition crashing the process would kill all threads.
One advantage of using multiple processes over multiple threads is that it would be easier to scale your program to work with multiple machines that communicate via network protocols.
For example, you could potentially run 16 processes on 8 dual-core machines, but would not have a benefit from more than 4 threads on a quad-core machine. If the amount of information you need to communicate is low, multiprocessing may make more sense.
As for the youtube-style you've described, I would say that suggests multiprocessing. If you follow MVC approaches, your GUI should not also contain the model (calculation result). With multiprocess, you can then communicate to a work manager that can report what data is already available.
"I checked it out and it looks good, but I also heard that processes, unlike threads, can't share a lot of information..."
This is only partially true.
Threads are part of a process -- threads share memory trivially. Which is as much of a problem as a help -- two threads with casual disregard for each other can overwrite memory and create serious problems.
Processes, however, share information through a lot of mechanisms. A Posix pipeline (
a | b
) means that process a and process b share information -- a writes it and b reads it. This works out really well for a lot things.The operating system will assign your processes to every available core as quickly as you create them. This works out really well for a lot of things.
Stackless Python is unrelated to this discussion -- it's faster and has different thread scheduling. But I don't think threads are the best route for this.
"I think my program will need to share a lot of information."
You should resolve this first. Then, determine how to structure processes around the flow of information. A "pipeline" is very easy and natural to do; any shell will create the pipeline trivially.
A "server" is another architecture where multiple client processes get and/or put information into a central server. This is a great way to share information. You can use the WSGI reference implementation as a way to build a simple, reliable server.
Very puzzled. Bastien Léonard rightly pointed out that the GIL will stop any ability to use threading in any useful way. His reference states:
This being the case, multi-processing is then the sensible choice. From my own experience Python + MT is of no noticeable benefit to the user.