I am writing a Python application in the field of scientific computing. Currently, when the user works with the GUI and starts a new physics simulation, the interpreter immediately imports several necessary modules for this simulation, such as Traits
and Mayavi
. These modules are heavy and take too long to import, and the user has to wait ~10 seconds before he can continue, which is bad.
I thought of something that might remedy this. I'll describe it and perhaps someone else has already implemented it, if so please give me a link. If not I might do it myself.
What I want is a separate thread that will import modules asynchronously. It will probably be a subclass of threading.Thread
.
Here's a usage example:
importer_thread = ImporterThread()
importer_thread.start()
# ...
importer_thread.import('Mayavi')
importer_thread.import('Traits')
# A thread-safe method that will put the module name
# into a queue which the thread in an inifine loop
# ...
# When the user actually needs the modules:
import Mayavi, Traits
# If they were already loaded by importer_thread, we're good.
# If not, we'll just have to wait as usual.
So do you know of anything like this? If not, do you have any suggestions about the design?
The problem with this is that the imports must still complete before they are usable. Depending on when they're first used, the application could still have to block for 10 seconds before it could start up anyway. Much more productive would be to profile the modules and figure out why they take so long to import.
Why not just do this when the app starts?
def background_imports():
import Traits
import Mayavi
thread = threading.Thread(target=background_imports)
thread.setDaemon(True)
thread.start()
The general idea is good, but the Python/GUI session might not be all that responsive while the background thread is import
ing away; unfortunately, import
inherently and inevitably "locks up" Python substantially (it's not just the GIL, there's specific extra locking for imports).
Still worth trying, as it might make things a bit better -- it's also very easy, since Queue
s are intrinsically thread-safe and, besides a Queue
's put
and get
, all you need is basically an __import__
. Still, don't be surprised if this doesn't help enough and you still need extra oomph.
If you have some drive that's intrinsically very fast, but with limited space, such as a "RAM drive" or a particularly snippy solid-state one, it may be worth keeping the needed packages in a .tar.bz2
(or other form of archive) and unpacking it onto the fast drive at program start (that's essentially just I/O and so it won't lock things up badly -- I/O operations rapidly release the GIL -- and also it's especially easy to delegate to a subprocess running tar xjf
or the like).
If some of the import slowness is due to a huge number of .py/.pyc/.pyo
files, it's worth a try to keep those (in .pyc
form only, not as .py
) in a zipfile and importing from there (but that only helps with the I/O overhead, depending on your OS, filesystem, and drive: doesn't help with delays due to loading huge DLLs or executing initialization code in packages at load time, which I suspect are likelier culprits for the slowness).
You could also consider splitting the application up with multiprocessing
-- again using Queues (but of the multiprocessing kind) to communicate -- so that both imports and some heavy computations are delegated to a few auxiliary processes and thus made asynchronous (this may also help fully exploiting multiple cores at once). I suspect this may unfortunately be hard to arrange properly for visualization tasks (such as those you're presumably doing with mayavi) but it might help if you also have some "pure heavy computation" packages and tasks.
"the user works with the GUI and starts a new physics simulation"
Not really clear. Does "works with the GUI" means double click? Double click what? Some wxWidgets GUI application? Or IDLE?
If so, what does "starts a new physics simulation" mean? Click a button somewhere else? A GUI button to bring up a panel where they write code? Or do they import a script they wrote off line?
Why is the import happening before the simulation starts? How long does a simulation take? What does the GUI show?
I suspect that there's a way to be much, much lazier in doing the big imports. But from the description, it's hard to determine if there's a point in time where the import doesn't matter as much to the user.
Threads don't help much. What helps is rethinking the UI experience.