Marshal (Ruby) pipes: sending serialized object to

I need to serialize an object in Ruby with Marshal and send it to a sub-process via pipes. How can I do this?

My code looks like the following, and my questions are in comments:

data = Marshal.dump(data)
#call sub-process
`ruby -r a_lib -e 'a_method'` #### how to send the stdout to the subprocess?

And the a_method looks like:

def a_method
  ...
  data = Marshal.load(data) #### how to load the stdout of the parent process?
  ...
end

回答1:

Yes, you can send serialized objects via pipe between different ruby/non-ruby processes!

Let me show you how I do it.

In this example a master process starts a child process, and then child process transmits a simple Hash object using Marshal serialization.

Master source code:

First it would be useful to declare some helper method run_ruby in Process class:

#encoding: UTF-8
require 'rbconfig'
module Process
  RUBY = RbConfig::CONFIG.values_at('bindir', 'BASERUBY').join('/')
# @param [String] command
# @param [Hash] options
  def Process.run_ruby(command, options)
    spawn("#{Process::RUBY} -- #{command}", options)
  end
end

This code just locates the ruby executable and saves full path into RUBY constant.

Important: If you are going to use Jruby or some other executable - you should rewrite this code and provide a path for executing it!

Next, we should start child process.
At this moment we can override STDIN, STDOUT and STDERR for new process.
Let us create a pipe and redirect child's STDOUT to this pipe:

  rd, wr = IO.pipe
  Process.run_ruby("./test/pipetest.rb param1 param2", {:out => wr})
  wr.close

Please note the options hash: {:out => wr} - It tells spawn command to redirect STDOUT to wr stream descriptor.

Also, you can specify params (see param1 and param2) in command line.

Note that we call wr.close because we do not use it in parent process for this example.

How master would receive object:

message = rd.gets         # read message header with size in bytes
cb = message[5..-1].to_i  # message is in form: "data <byte_size>\n"
data = rd.read(cb)        # read message with binary object
puts "Parent read #{data.length} from #{cb} bytes:"
obj = Marshal::load(data) # unserialize object
puts obj.inspect

Child source code:

Now, how serialized object will be transmitted?
At first child will serialize object,
then it will send parent message in the form: "data <byte_size>\n"
After that it will send serialized object itself.
Child process will send object to STDOUT since we have specified to use this channel as a pipe.

#encoding: UTF-8

# obj is an example Hash object to be transmitted
obj = {
    1 => 'asd',
    'data' => 255,
    0 => 0.55
}

data = Marshal::dump(obj)             # serializing object (obj)
$stdout.puts "data #{data.length}"    # sending message header
$stdout.write data                    # sending message itself
$stdout.flush                         # Important: flush data!

In the code above child process simply outputs one serialized object and terminates.
But of course, you can program much more complex behavior.
For instance, I start many child processes, each sharing one and the same pipe to parent process at STDOUT. To avoid problems with two children writing to pipe simultaneously I have to use system-level Mutex (not a Ruby Mutex) to control access to this pipe.

回答2:

You can use IO::pipe method.

I think you've chosen not the best way to create child process. Backticks does fork and exec behind the scene and ruby command does fork and exec too. This means that your command:

`ruby -r a_lib -e 'a_method'`

does the following: fork current process, transform it to shell process, fork the shell process, transform it to ruby process.

I suggest using fork method:

data = Marshal.dump(data)

reader, writer = IO.pipe
reader.close # parent process will be on the writing side of the pipe
writer.puts data

#call sub-process
fork do
  writer.close # child process can only read from the pipe
  data = reader.gets
  # whatever needs to be done with data
end