How do I deserialize in Psych to return an existing object, such as a class object?
To do serialization of a class, I can do
require "psych"
class Class
yaml_tag 'class'
def encode_with coder
coder.represent_scalar 'class', name
end
end
yaml_string = Psych.dump(String) # => "--- !<class> String\n...\n"
but if I try doing Psych.load
on that, I get an anonymous class, rather than the String class.
The normal deserialization method is Object#init_with(coder)
, but that only changes the state of the existing anonymous class, whereas I'm wanting the String class.
Psych::Visitors::ToRuby#visit_Psych_Nodes_Scalar(o)
has cases where rather than modifying existing objects with init_with
, they make sure the right object is created in the first place (for example, calling Complex(o.value)
to deserialize a complex number), but I don't think I should be monkeypatching that method.
Am I doomed to working with low level or medium level emitting, or am I missing something?
Background
I'll describe the project, why it needs classes, and why it needs (de)serialization.
Project
The Small Eigen Collider aims to create random tasks for Ruby to run. The initial aim was to see if the different implementations of Ruby (for example, Rubinius and JRuby) returned the same results when given the same random tasks, but I've found that it's also good for detecting ways to segfault Rubinius and YARV.
Each task is composed of the following:
receiver.send(method_name, *parameters, &block)
where receiver
is a randomly chosen object, and method_name
is the
name of a randomly chosen method, and *parameters
is an array of
randomly chosen objects. &block
is not very random - it's basically
equivalent to {|o| o.inspect}
.
For example, if receiver were "a", method_name was :casecmp, and parameters was ["b"], then you'd be calling
"a".send(:casecmp, "b") {|x| x.inspect}
which is equivalent to (since the block is irrelevant)
"a".casecmp("b")
the Small Eigen Collider runs this code, and logs these inputs and also the return value. In this example, most implementations of Ruby return -1, but at one stage, Rubinius returned +1. (I filed this as a bug https://github.com/evanphx/rubinius/issues/518 and the Rubinius maintainers fixed the bug)
Why it needs classes
I want to be able to use class objects in my Small Eigen Collider. Typically, they would be the receiver, but they could also be one of the parameters.
For example, I found that one way to segfault YARV is to do
Thread.kill(nil)
In this case, receiver is the class object Thread, and parameters is [nil]. (Bug report: http://redmine.ruby-lang.org/issues/show/4367 )
Why it needs (de)serialization
The Small Eigen Collider needs serialization for a couple of reasons.
One is that using a random number generator to generate a series of random tasks every time isn't practical. JRuby has a different builtin random number generator, so even when given the same PRNG seed it'd give different tasks to YARV. Instead, what I do is I create a list of random tasks once (the first running of ruby bin/small_eigen_collider), have the initial running serialize the list of tasks to tasks.yml, and then have subsequent runnings of the program (using different Ruby implementations) read in that tasks.yml file to get the list of tasks.
Another reason I need serialization is that I want to be able to edit the list of tasks. If I have a long list of tasks that leads to a segmentation fault, I want to reduce the list to the minimum required to cause a segmentation fault. For example, with the following bug https://github.com/evanphx/rubinius/issues/643 ,
ObjectSpace.undefine_finalizer(:symbol)
by itself doesn't cause a segmentation fault, and nor does
Symbol.all_symbols.inspect
but if you put the two together, it did. But I started out with thousands of tasks, and needed to pare it back to just those two tasks.
Does deserialization returning existing class objects make sense in this context, or do you think there's a better way?