I have successfully built a RESTful microservice with Python asyncio and aiohttp that listens to a POST event to collect realtime events from various feeders.
It then builds an in-memory structure to cache the last 24h of events in a nested defaultdict/deque structure.
Now I would like to periodically checkpoint that structure to disc, preferably using pickle.
Since the memory structure can be >100MB I would like to avoid holding up my incoming event processing for the time it takes to checkpoint the structure.
I'd rather create a snapshot copy (e.g. deepcopy) of the structure and then take my time to write it to disk and repeat on a preset time interval.
I have been searching for examples on how to combine threads (and is a thread even the best solution for this?) and asyncio for that purpose but could not find something that would help me.
Any pointers to get started are much appreciated!
It's pretty simple to delegate a method to a thread or sub-process using
BaseEventLoop.run_in_executor
:As for whether to use a
ProcessPoolExecutor
orThreadPoolExecutor
, that's kind of hard to say; pickling a large object will definitely eat some CPU cycles, which initially would you make thinkProcessPoolExecutor
is the way to go. However, passing your 100MB object to aProcess
in the pool would require pickling the instance in your main process, sending the bytes to the child process via IPC, unpickling it in the child, and then pickling it again so you can write it to disk. Given that, my guess is the pickling/unpickling overhead will be large enough that you're better off using aThreadPoolExecutor
, even though you're going to take a performance hit because of the GIL.That said, it's very simple to test both ways and find out for sure, so you might as well do that.
I also used
run_in_executor
, but I found this function kinda gross under most circumstances, since it requirespartial()
for keyword args and I'm never calling it with anything other than a single executor and the default event loop. So I made a convenience wrapper around it with sensible defaults and automatic keyword argument handling.