Memory leak in IPython.parallel module?

2019-02-18 05:00发布

I'm using IPython.parallel to process a large amount of data on a cluster. The remote function I run looks like:

def evalPoint(point, theta):
    # do some complex calculation
    return (cost, grad)

which is invoked by this function:

def eval(theta, client, lview, data):
    async_results = []
    for point in data:
        # evaluate current data point
        ar = lview.apply_async(evalPoint, point, theta)
        async_results.append(ar)

    # wait for all results to come back
    client.wait(async_results)

    # and retrieve their values
    values = [ar.get() for ar in async_results]

    # unzip data from original tuple
    totalCost, totalGrad = zip(*values)

    avgGrad =  np.mean(totalGrad, axis=0)
    avgCost = np.mean(totalCost, axis=0)

    return (avgCost, avgGrad)

If I run the code:

client = Client(profile="ssh")
client[:].execute("import numpy as np")        

lview = client.load_balanced_view()

for i in xrange(100):
    eval(theta, client, lview, data)

the memory usage keeps growing until I eventually run out (76GB of memory). I've simplified evalPoint to do nothing in order to make sure it wasn't the culprit.

The first part of eval was copied from IPython's documentation on how to use the load balancer. The second part (unzipping and averaging) is fairly straight-forward, so I don't think that's responsible for the memory leak. Additionally, I've tried manually deleting objects in eval and calling gc.collect() with no luck.

I was hoping someone with IPython.parallel experience could point out something obvious I'm doing wrong, or would be able to confirm this in fact a memory leak.

Some additional facts:

  • I'm using Python 2.7.2 on Ubuntu 11.10
  • I'm using IPython version 0.12
  • I have engines running on servers 1-3, and the client and hub running on server 1. I get similar results if I keep everything on just server 1.
  • The only thing I've found similar to a memory leak for IPython had to do with %run, which I believe was fixed in this version of IPython (also, I am not using %run)

update

Also, I tried switching logging from memory to SQLiteDB, in case that was the problem, but still have the same problem.

response(1)

The memory consumption is definitely in the controller (I could verify this by: (a) running the client on another machine, and (b) watching top). I hadn't realized that non SQLiteDB would still consume memory, so I hadn't bothered purging.

If I use DictDB and purge, I still see the memory consumption go up, but at a much slower rate. It was hovering around 2GB for 20 invocations of eval().

If I use MongoDB and purge, it looks like mongod is taking around 4.5GB of memory and ipcluster about 2.5GB.

If I use SQLite and try to purge, I get the following error:

File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/hub.py", line 1076, in purge_results
  self.db.drop_matching_records(dict(completed={'$ne':None}))
File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/sqlitedb.py", line 359, in drop_matching_records
  expr,args = self._render_expression(check)
File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/sqlitedb.py", line 296, in _render_expression
  expr = "%s %s"%null_operators[op]
TypeError: not enough arguments for format string

So, I think if I use DictDB, I might be okay (I'm going to try a run tonight). I'm not sure if some memory consumption is still expected or not (I also purge in the client like you suggested).

1条回答
霸刀☆藐视天下
2楼-- · 2019-02-18 05:31

Is it the controller process that is growing, or the client, or both?

The controller remembers all requests and all results, so the default behavior of storing this information in a simple dict will result in constant growth. Using a db backend (sqlite or preferably mongodb if available) should address this, or the client.purge_results() method can be used to instruct the controller to discard any/all of the result history (this will delete them from the db if you are using one).

The client itself caches all of its own results in its results dict, so this, too, will result in growth over time. Unfortunately, this one is a bit harder to get a handle on, because references can propagate in all sorts of directions, and is not affected by the controller's db backend.

This is a known issue in IPython, but for now, you should be able to clear the references manually by deleting the entries in the client's results/metadata dicts and if your view is sticking around, it has its own results dict:

# ...
# and retrieve their values
values = [ar.get() for ar in async_results]

# clear references to the local cache of results:
for ar in async_results:
    for msg_id in ar.msg_ids:
        del lview.results[msg_id]
        del client.results[msg_id]
        del client.metadata[msg_id]

Or, you can purge the entire client-side cache with simple dict.clear():

view.results.clear()
client.results.clear()
client.metadata.clear()

Side note:

Views have their own wait() method, so you shouldn't need to pass the Client to your function at all. Everything should be accessible via the View, and if you really need the client (e.g. for purging the cache), you can get it as view.client.

查看更多
登录 后发表回答