Read and write YAML files without destroying ancho

2019-01-15 16:39发布

问题:

This question has been asked before: Read and write YAML files without destroying anchors and aliases?

I was wondering how to solve that problem with many anchors and aliases?

thanks

回答1:

The problem here is that anchors and aliases in Yaml are a serialization detail, and so aren’t part of the data after it’s been parsed, so the original anchor name isn’t known when writing the data back out to Yaml. In order to keep the anchor names when round tripping you need to store them somewhere when parsing so that they are available later when serializing. In Ruby any object can have instance variables associated with it, so an easy way to achieve this would be to store the anchor name in an instance variable of the objet in question.

Continuing from the example in the earlier question, for hashes we can change our redifined revive_hash method so that if the hash is an anchor then as well as recording the anchor name in the @st variable so later alises can be recognised, we add the it as an instance variable on the hash.

class ToRubyNoMerge < Psych::Visitors::ToRuby
  def revive_hash hash, o
    if o.anchor
      @st[o.anchor] = hash
      hash.instance_variable_set "@_yaml_anchor_name", o.anchor
    end

    o.children.each_slice(2) { |k,v|
      key = accept(k)
      hash[key] = accept(v)
    }
    hash
  end
end

Note that this only affects yaml mappings that are anchors. If you want to have other types to keep their anchor name you’ll need to look at psych/visitors/to_ruby.rb and make sure the name is added in all cases. Most types can be included by overriding register but there are a couple of others; search for @st.

Now that the hash has the desired anchor name associated with it, you need to make Psych use it instead of the object id when serializing it. This can be done by subclassing YAMLTree. When YAMLTree processes an object, it first checks to see if that object has been seen already, and emits an alias for it if it has. For any new objects, it records that it has seen the object in case it needs to create an alias later. The object_id is used as the key in this, so you need to override those two methods to check for the instance variable, and use that instead if it exists:

class MyYAMLTree < Psych::Visitors::YAMLTree

  # check to see if this object has been seen before
  def accept target
    if anchor_name = target.instance_variable_get('@_yaml_anchor_name')
      if @st.key? anchor_name
        oid         = anchor_name
        node        = @st[oid]
        anchor      = oid.to_s
        node.anchor = anchor
        return @emitter.alias anchor
      end
    end

    # accept is a pretty big method, call super to avoid copying
    # it all here. super will handle the cases when it's an object
    # that's been seen but doesn't have '@_yaml_anchor_name' set
    super
  end

  # record object for future, using '@_yaml_anchor_name' rather
  # than object_id if it exists
  def register target, yaml_obj
    anchor_name = target.instance_variable_get('@_yaml_anchor_name') || target.object_id
    @st[anchor_name] = yaml_obj
    yaml_obj
  end
end

Now you can use it like this (unlike the previous question, you don’t need to create a custom emitter in this case):

builder = MyYAMLTree.new
builder << data

tree = builder.tree

puts tree.yaml # returns a string

# alternativelty write direct to file:
File.open('a_file.yml', 'r+') do |f|
  tree.yaml f
end


回答2:

here's a slightly modified version for up to newer versions of the psych gem. before it gave me the following error:

NoMethodError - undefined method `[]=' for #<Psych::Visitors::YAMLTree::Registrar:0x007fa0db6ba4d0>

the register method moved into a subclass of YAMLTree, so this works now with respect to everything what matt says in his answer:

class ToRubyNoMerge < Psych::Visitors::ToRuby
  def revive_hash hash, o
    if o.anchor
      @st[o.anchor] = hash
      hash.instance_variable_set "@_yaml_anchor_name", o.anchor
    end

    o.children.each_slice(2) { |k,v|
      key = accept(k)
      hash[key] = accept(v)
    }
    hash
  end
end

class MyYAMLTree < Psych::Visitors::YAMLTree
  class Registrar
    # record object for future, using '@_yaml_anchor_name' rather
    # than object_id if it exists
    def register target, node
      anchor_name = target.instance_variable_get('@_yaml_anchor_name') || target.object_id
      @obj_to_node[anchor_name] = node
    end
  end

  # check to see if this object has been seen before
  def accept target
    if anchor_name = target.instance_variable_get('@_yaml_anchor_name')
      if @st.key? anchor_name
        oid         = anchor_name
        node        = @st[oid]
        anchor      = oid.to_s
        node.anchor = anchor
        return @emitter.alias anchor
      end
    end

    # accept is a pretty big method, call super to avoid copying
    # it all here. super will handle the cases when it's an object
    # that's been seen but doesn't have '@_yaml_anchor_name' set
    super
  end

end


回答3:

I had to further modify the code that @markus posted to work with Psych v2.0.17.

Here's what I ended up with. I hope it helps someone else save quite a bit of time. :-)

class ToRubyNoMerge < Psych::Visitors::ToRuby
  def revive_hash hash, o
    if o.anchor
      @st[o.anchor] = hash
      hash.instance_variable_set "@_yaml_anchor_name", o.anchor
    end

    o.children.each_slice(2) do |k,v|
      key = accept(k)
      hash[key] = accept(v)
    end
    hash
  end
end

class Psych::Visitors::YAMLTree::Registrar
  # record object for future, using '@_yaml_anchor_name' rather
  # than object_id if it exists
  def register target, node
    @targets << target
    @obj_to_node[_anchor_name(target)] = node
  end

  def key? target
    @obj_to_node.key? _anchor_name(target)
  rescue NoMethodError
    false
  end

  def node_for target
    @obj_to_node[_anchor_name(target)]
  end

  private

  def _anchor_name(target)
    target.instance_variable_get('@_yaml_anchor_name') || target.object_id
  end
end

class MyYAMLTree < Psych::Visitors::YAMLTree
  # check to see if this object has been seen before
  def accept target
    if anchor_name = target.instance_variable_get('@_yaml_anchor_name')
      if @st.key? target
        node        = @st.node_for target
        node.anchor = anchor_name
        return @emitter.alias anchor_name
      end
    end

    # accept is a pretty big method, call super to avoid copying
    # it all here. super will handle the cases when it's an object
    # that's been seen but doesn't have '@_yaml_anchor_name' set
    super
  end

  def visit_String o
    if o == '<<'
      style = Psych::Nodes::Scalar::PLAIN
      tag   = 'tag:yaml.org,2002:str'
      plain = true
      quote = false

      return @emitter.scalar o, nil, tag, plain, quote, style
    end

    # visit_String is a pretty big method, call super to avoid copying it all
    # here. super will handle the cases when it's a string other than '<<'
    super
  end
end