I'm using IPython.parallel to process a large amount of data on a cluster. The remote function I run looks like:
def evalPoint(point, theta):
# do some complex calculation
return (cost, grad)
which is invoked by this function:
def eval(theta, client, lview, data):
async_results = []
for point in data:
# evaluate current data point
ar = lview.apply_async(evalPoint, point, theta)
async_results.append(ar)
# wait for all results to come back
client.wait(async_results)
# and retrieve their values
values = [ar.get() for ar in async_results]
# unzip data from original tuple
totalCost, totalGrad = zip(*values)
avgGrad = np.mean(totalGrad, axis=0)
avgCost = np.mean(totalCost, axis=0)
return (avgCost, avgGrad)
If I run the code:
client = Client(profile="ssh")
client[:].execute("import numpy as np")
lview = client.load_balanced_view()
for i in xrange(100):
eval(theta, client, lview, data)
the memory usage keeps growing until I eventually run out (76GB of memory). I've simplified evalPoint
to do nothing in order to make sure it wasn't the culprit.
The first part of eval
was copied from IPython's documentation on how to use the load balancer. The second part (unzipping and averaging) is fairly straight-forward, so I don't think that's responsible for the memory leak. Additionally, I've tried manually deleting objects in eval
and calling gc.collect()
with no luck.
I was hoping someone with IPython.parallel experience could point out something obvious I'm doing wrong, or would be able to confirm this in fact a memory leak.
Some additional facts:
- I'm using Python 2.7.2 on Ubuntu 11.10
- I'm using IPython version 0.12
- I have engines running on servers 1-3, and the client and hub running on server 1. I get similar results if I keep everything on just server 1.
- The only thing I've found similar to a memory leak for IPython had to do with
%run
, which I believe was fixed in this version of IPython (also, I am not using%run
)
update
Also, I tried switching logging from memory to SQLiteDB, in case that was the problem, but still have the same problem.
response(1)
The memory consumption is definitely in the controller (I could verify this by: (a) running the client on another machine, and (b) watching top). I hadn't realized that non SQLiteDB would still consume memory, so I hadn't bothered purging.
If I use DictDB and purge, I still see the memory consumption go up, but at a much slower rate. It was hovering around 2GB for 20 invocations of eval().
If I use MongoDB and purge, it looks like mongod is taking around 4.5GB of memory and ipcluster about 2.5GB.
If I use SQLite and try to purge, I get the following error:
File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/hub.py", line 1076, in purge_results
self.db.drop_matching_records(dict(completed={'$ne':None}))
File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/sqlitedb.py", line 359, in drop_matching_records
expr,args = self._render_expression(check)
File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/sqlitedb.py", line 296, in _render_expression
expr = "%s %s"%null_operators[op]
TypeError: not enough arguments for format string
So, I think if I use DictDB, I might be okay (I'm going to try a run tonight). I'm not sure if some memory consumption is still expected or not (I also purge in the client like you suggested).