Complete Working Test Case
Of course depending on your memory on the local and remote machines your array sizes will be different.
z1 = numpy.random.rand(300000000,2);
for i in range(1000):
print('*******************************************\n');
direct_output = subprocess.check_output('ssh blah@blah "ls /"', shell=True);
direct_output = 'a'*1200000;
a2 = direct_output*10;
print(len(direct_output));
Current Use Case
In case it helps my use case is as follows:
I issue db queries then store the resulting tables on the remote machine. I then want to transfer them across a network and do analysis. Thus far I have been doing something like the following in python:
#run a bunch of queries before hand with the results in remote files
....
counter = 0
mergedDataFrame = None
while NotDone:
output = subprocess.check_output('ssh blah@blah cat /data/file%08d'%(counter))
data = pandas.read_csv(...)
#do lots of analysis, append, merge, numpy stuff etc...
mergedDataFrame = pandas.merge(...)
counter += 1
At some point I receive the following error at the check_output command: [Errno 12] Cannot allocate memory
Background
Thanks to the below questions I think I have an idea of what is wrong. There are a number of solutions posted, and I am trying to determine which of the solutions will avoid the [Errno 12] Cannot allocate memory error associated with the subprocess implementation using fork/clone.
Python subprocess.Popen "OSError: [Errno 12] Cannot allocate memory" This gives the underlying diagnosis and suggests some workaround like spawning separate script etc...
Understanding Python fork and memory allocation errors Suggests using rfoo to circumvent the subprocess limitation of fork/clone and spawning child process and copy memory etc... This seems to imply a client-server model
What is the simplest way to SSH using Python? , but I have the additional constraints that I cannot use subprocess due to memory limitations and fork/clone implementation? The solutions suggests using paramiko or something built on top of it, others suggest subprocess (which I have found will not work in my case).
There were other similar questions but the answers often talked about file descriptors being the culprit (in this case they are not), adding more RAM to the system ( I cannot do this), upgrading to x64 ( I already am on x64). Some hint at the problem of ENOMEM. A few answers mention trying to determine if the subprocess.Popen (in my case check_output) is not properly cleaning the processes, but it looks like S. Lott and others agree that the subprocess code itself is properly cleaning up.
- Python memory allocation error using subprocess.Popen
- Python IOError cannot allocate memory although there is plenty
- Cannot allocate memory on Popen commands
- Python subprocess.Popen erroring with OSError: [Errno 12] Cannot allocate memory after period of time
I have searched through the source code on github https://github.com/paramiko/paramiko/search?q=Popen&type=Code and it appears to use subprocess in the proxy.py file.
Actual Questions
Does this mean that ultimately paramiko is using the Popen solution described above that will have problems when the python memory footprint grows and repeated Popen calls are made due to the clone/fork implementation?
If paramiko will not work is there another way to do what I am looking for with a client side only solution? Or will a client/server/socket solution be needed? If so will any of rfoo, tornado, or zeromq, http transfers work here?
Notes I am running 64bit linux 8GB main memory. I do not want to pursue the options of buying more RAM.
If you are running out of memory, you may want to increase your swap memory. Or you might have no swap enabled at all. In Ubuntu (it should work for other distributions as well) you can check your swap by:
if it is empty it means you don't have any swap enabled. To add a 1GB swap:
Add the following line to the
fstab
to make the swap permanent.Source and more information can be found here.
This should do it:
http://docs.python.org/3.3/library/subprocess.html#replacing-os-popen-os-popen2-os-popen3
That way, you can read lines or blocks, instead of the entire thing at once.
If you're running out of memory, it's probably because subprocess is trying to read too much into memory at the same time. The solution, other than using a redirect to a local file, is probably to use popen-like functionality with an stdin/stdout pair that can be read from a little at a time.