multiprocessing memory usage and twisted/gevents

2019-09-08 08:33发布

问题:

so my... err... app does the following:

  • listen on a queue for 'work'
  • spawns about 100 workers per server (across ~3 servers), each listening on the queue
  • each worker basically does some networky stuff (ssh, snmp etc) (i/o intensive), then churns the output (very cpu intensive)

i have it all working under multiprocessing and it works great. however: each worker is using way more memory than i would like (about 30MB RES, 450MB VIRT according to top). so i have two questions:

  • what is the best way for me to determine why the overhead is so high? i'm guessing COW isn't working too well... what modules could i use to get a snapshot of all of the main thread's memory prior to multiprocessing so i can try to reduce the initial footprint?

  • given that most of my processes are cpu bound, would there be a benefit to port my code over to gevent/twisted? i would like to make use of the dual hex-cores of each server.

thanks!

回答1:

There was a great talk on Pycon which explains the subject of memory usage in python. It definitely a half an hour well spent.

The bottom line is that to really know how much memory is used you should not be looking at top output, but check how much memory you have free before and after running your 100 workers.



回答2:

CPython uses reference counting to implement memory management for all Python objects. The way this works is that each Python object is represented as a struct and each struct has a field in it giving the reference count. Whenever a new reference is made to the object, the reference count in that field is incremented. Whenever a reference to the object is given up, the reference count in that field is decremented. Once the reference count is zero the interpreter can be pretty sure the Python object is no longer needed and can free the memory allocated to the struct representing it.

Lots of things change the reference count of an object. Passing it to a function or assigning it to a (local or global) variable or an attribute of an object will increment the reference count (so will lots of other operations). The reverse of these decrements the reference count: for example, returning from a function decrements the reference count of all locals.

The reason all that is relevant to your question is that it should give you some idea of why the copy-on-write behavior you get out of fork() isn't going to help you save a whole lot of memory. Almost immediately, the CPython runtime is going to visit a large portion of the memory pages (the base unit of memory copy-on-write considers - often 4kB, perhaps larger) and replace lots of 2s with 3s or 4s with 3s or whatever. This will force much of the memory for the process to be copied.

An event-driven system will help with this by letting you do many I/O-bound tasks concurrently. You can still use multiple processes (at least with Twisted) to take advantage of the extra CPU resources you have at your disposal. A single, event-driven process can do all of the necessary networking and then hand off the resulting data to worker processes that get to use the rest of your CPUs. You can be more precise in what code you run in those extra processes, though. From your question, I suspect you think that your workers don't need everything that has been loaded into your "main" process. Using Twisted's process management APIs, they won't have to spend any memory on those things.