On Machine1, I have a Python2.7 script that computes a big (up to 10MB) binary string in RAM that I'd like to write to a disk file on Machine2, which is a remote machine. What is the best way to do this?
Constraints:
Both machines are Ubuntu 13.04. The connection between them is fast -- they are on the same network.
The destination directory might not yet exist on Machine2, so it might need to be created.
If it's easy, I would like to avoid writing the string from RAM to a temporary disk file on Machine1. Does that eliminate solutions that might use a system call to rsync?
Because the string is binary, it might contain bytes that could be interpreted as a newline. This would seem to rule out solutions that might use a system call to the echo command on Machine2.
I would like this to be as lightweight on Machine2 as possible. Thus, I would like to avoid running services like ftp on Machine2 or engage in other configuration activities there. Plus, I don't understand security that well, and so would like to avoid opening additional ports unless truly necessary.
I have ssh keys set up on Machine1 and Machine2, and would like to use them for authentication.
EDIT: Machine1 is running multiple threads, and so it is possible that more than one thread could attempt to write to the same file on Machine2 at overlapping times. I do not mind the inefficiency caused by having the file written twice (or more) in this case, but the resulting datafile on Machine2 should not be corrupted by simultaneous writes. Maybe an OS lock on Machine2 is needed?
I'm rooting for an rsync solution, since it is a self-contained entity that I understand reasonably well, and requires no configuration on Machine2.
You open a new SSH process to Machine2 using subprocess.Popen
and then you write your data to its STDIN.
import subprocess
cmd = ['ssh', 'user@machine2',
'mkdir -p output/dir; cat - > output/dir/file.dat']
p = subprocess.Popen(cmd, stdin=subprocess.PIPE)
your_inmem_data = 'foobarbaz\0' * 1024 * 1024
for chunk_ix in range(0, len(your_inmem_data), 1024):
chunk = your_inmem_data[chunk_ix:chunk_ix + 1024]
p.stdin.write(chunk)
I've just verified that it works as advertised and copies all of the 10485760 dummy bytes.
P.S. A potentially cleaner/more elegant solution would be to have the Python program write its output to sys.stdout
instead and do the piping to ssh
externally:
$ python process.py | ssh <the same ssh command>
Paramiko supports opening files on remote machines:
import paramiko
def put_file(machinename, username, dirname, filename, data):
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(machinename, username=username)
sftp = ssh.open_sftp()
try:
sftp.mkdir(dirname)
except IOError:
pass
f = sftp.open(dirname + '/' + filename, 'w')
f.write(data)
f.close()
ssh.close()
data = 'This is arbitrary data\n'.encode('ascii')
put_file('v13', 'rob', '/tmp/dir', 'file.bin', data)
If just calling a subprocess is all you want, maybe sh.py
could be the right thing.
from sh import ssh
remote_host = ssh.bake(<remote host>)
remote_host.dd(_in = <your binary string>, of=<output filename on remote host>)
A solution in which you don't explicitly send your data over some connection would be to use sshfs. You can use it to mount a directory from Machine2 somewhere on Machine1 and writing to a file in that directory will automatically result in the data being written to Machine2.