Parsing XML to hash with Nori and Nokogiri with un

2019-07-28 02:00发布

I am attempting to convert an XML document to a Ruby hash using Nori. But instead of receiving a collection of the root element, a new node containing the collection is returned. This is what I am doing:

@xml  = content_for(:layout)
@hash = Nori.new(:parser => :nokogiri, :advanced_typecasting => false).parse(@xml)

or

@hash = Hash.from_xml(@xml)

Where the content of @xml is:

<bundles>
  <bundle>
    <id>6073</id>
    <name>Bundle-1</name>
    <status>1</status>
    <bundle_type>
      <id>6713</id>
      <name>BundleType-1</name>
    </bundle_type>
    <begin_at nil=\"true\"></begin_at>
    <end_at nil=\"true\"></end_at>
    <updated_at>2013-03-21T23:02:32Z</updated_at>
    <created_at>2013-03-21T23:02:32Z</created_at>
  </bundle>
  <bundle>
    <id>6074</id>
    <name>Bundle-2</name>
    <status>1</status>
    <bundle_type>
      <id>6714</id>
      <name>BundleType-2</name>
    </bundle_type>
    <begin_at nil=\"true\"></begin_at>
    <end_at nil=\"true\"></end_at>
    <updated_at>2013-03-21T23:02:32Z</updated_at>
    <created_at>2013-03-21T23:02:32Z</created_at>
  </bundle>
</bundles>

The parser returns @hash of format:

{"bundles"=>{"bundle"=>[{"id"=>"6073", "name"=>"Bundle-1", "status"=>"1", "bundle_type"=>{"id"=>"6713", "name"=>"BundleType-1"}, "begin_at"=>nil, "end_at"=>nil, "updated_at"=>"2013-03-21T23:02:32Z", "created_at"=>"2013-03-21T23:02:32Z"}, {"id"=>"6074", "name"=>"Bundle-2", "status"=>"1", "bundle_type"=>{"id"=>"6714", "name"=>"BundleType-2"}, "begin_at"=>nil, "end_at"=>nil, "updated_at"=>"2013-03-21T23:02:32Z", "created_at"=>"2013-03-21T23:02:32Z"}]}} 

Instead I would like to get:

{"bundles"=>[{"id"=>"6073", "name"=>"Bundle-1", "status"=>"1", "bundle_type"=>{"id"=>"6713", "name"=>"BundleType-1"}, "begin_at"=>nil, "end_at"=>nil, "updated_at"=>"2013-03-21T23:02:32Z", "created_at"=>"2013-03-21T23:02:32Z"}, {"id"=>"6074", "name"=>"Bundle-2", "status"=>"1", "bundle_type"=>{"id"=>"6714", "name"=>"BundleType-2"}, "begin_at"=>nil, "end_at"=>nil, "updated_at"=>"2013-03-21T23:02:32Z", "created_at"=>"2013-03-21T23:02:32Z"}]}

The point is that I control the XML, where it if formed similar to the way described above.

My question is also related to Does RABL's JSON output not conform to standard? Can it?

2条回答
Fickle 薄情
2楼-- · 2019-07-28 02:06

The general solution to to your problem is not very pretty.

I created a special Object that I named an ArrayHash. It has the special property that if in has only one key and that value of the data pointed to by that key is an array it adds integer keys to those array elements.

So if normal ruby Hash dictionary would look like

{bundle"=>["0", "1", "A", "B"]}

then in an ArrayHash dictionaary would look like this

{"bundle"=>["0", "1", "A", "B"], 0=>"0", 1=>"1", 2=>"A", 3=>"B"}

Since the extra keys are of type Fixnum this Hash looks just like the Array

[ "0", "1", "A", "B" ]

except that it also has a "bundle" entry so its size is 5

Below is the code to force Nori to use this special Dictionary.

require 'nori'

class Nori
  class ArrayHash < Hash
    def [](a)
      if a.is_a? Fixnum and self.size == 1
        key = self.keys[0]
        self[key][a]
      else
        super
      end
    end
    def inspect
      if self.size == 1 and self.to_a[0][1].class == Array
        p = Hash[self.to_a]
        self.values[0].each.with_index do |v, i|
          p[i] = v
        end
        p.inspect
      else
        super
      end
    end
  end
end

class Nori
  class XMLUtilityNode
    alias :old_to_hash :to_hash
    def to_hash
      ret = old_to_hash
      raise if ret.size != 1
      raise unless ret.class == Hash
      a = ret.to_a[0]
      k, v = a.first, a.last
      if v.class == Hash
        v = ArrayHash[ v.to_a ]
      end
      ret = ArrayHash[ k, v ]
      ret
    end
  end
end


h = Nori.new(:parser => :nokogiri, :advanced_typecasting => false).parse(<<EOF)
<top>
<aundles>
  <bundle>0</bundle>
  <bundle>1</bundle>
  <bundle>A</bundle>
  <bundle>B</bundle>
</aundles>
<bundles>
  <nundle>A</nundle>
  <bundle>A</bundle>
  <bundle>B</bundle>
</bundles>
</top>
EOF

puts "#{h['top']['aundles'][0]} == #{ h['top']['aundles']['bundle'][0]}"
puts "#{h['top']['aundles'][1]} == #{ h['top']['aundles']['bundle'][1]}"
puts "#{h['top']['aundles'][2]} == #{ h['top']['aundles']['bundle'][2]}"
puts "#{h['top']['aundles'][3]} == #{ h['top']['aundles']['bundle'][3]}"

puts h.inspect

The output is then

0 == 0
1 == 1
A == A
B == B
{"top"=>{"aundles"=>{"bundle"=>["0", "1", "A", "B"], 0=>"0", 1=>"1", 2=>"A", 3=>"B"}, "bundles"=>{"nundle"=>"A", "bundle"=>["A", "B"]}}}
查看更多
Rolldiameter
3楼-- · 2019-07-28 02:24

Imagine an XML that consists only of a list of the same tags, e.g.

<shoppinglist>
    <item>apple</item>
    <item>banana</item>
    <item>cherry</item>
    <item>pear</item>
<shoppinglist>

When you convert this into a hash, it is quite straightforward to access the items with e.g. hash['shoppinglist']['item'][0]. But what would you expect in this case? just an array? According to your logic, the items should now be accessible with hash['shoppinglist'][0] but what if you have different elements inside the container e.g.

<shoppinglist>
    <date>2013-01-01</date>
    <item>apple</item>
    <item>banana</item>
    <item>cherry</item>
    <item>pear</item>
<shoppinglist>

How would you now access the items? And how the date? The problem is that the conversion to a hash has to work in the general case.

Although i do not know Nori, i am pretty sure what you ask from it is not baked in, just because it makes no sense when you consider the general case. As an alternative, you can still get the bundle array up one level by yourself:

@hash['bundles'] = @hash['bundles']['bundle']
查看更多
登录 后发表回答