How to flatten a hash, making each key a unique va

2019-06-15 07:11发布

问题:

I want to take a hash with nested hashes and arrays and flatten it out into a single hash with unique values. I keep trying to approach this from different angles, but then I make it way more complex than it needs to be and get myself lost in what's happening.

Example Source Hash:

{
  "Name" => "Kim Kones",
  "License Number" => "54321",
  "Details" => {
    "Name" => "Kones, Kim",
    "Licenses" => [
      {
        "License Type" => "PT",
        "License Number" => "54321"
      },
      {
        "License Type" => "Temp",
        "License Number" => "T123"
      },
      {
        "License Type" => "AP",
        "License Number" => "A666",
        "Expiration Date" => "12/31/2020"
      }
    ]
  }
}

Example Desired Hash:

{
  "Name" => "Kim Kones",
  "License Number" => "54321",
  "Details_Name" => "Kones, Kim",
  "Details_Licenses_1_License Type" => "PT",
  "Details_Licenses_1_License Number" => "54321",
  "Details_Licenses_2_License Type" => "Temp",
  "Details_Licenses_2_License Number" => "T123",
  "Details_Licenses_3_License Type" => "AP",
  "Details_Licenses_3_License Number" => "A666",
  "Details_Licenses_3_Expiration Date" => "12/31/2020"
}

For what it's worth, here's my most recent attempt before giving up.

def flattify(hashy)
    temp = {}
    hashy.each do |key, val|
        if val.is_a? String
            temp["#{key}"] = val
        elsif val.is_a? Hash
            temp.merge(rename val, key, "")
        elsif val.is_a? Array
            temp["#{key}"] = enumerate val, key
        else
        end
        print "=> #{temp}\n"
    end
    return temp
end

def rename (hashy, str, n)
    temp = {}
    hashy.each do |key, val|
        if val.is_a? String
            temp["#{key}#{n}"] = val
        elsif val.is_a? Hash
            val.each do |k, v|
                temp["#{key}_#{k}#{n}"] = v
            end
        elsif val.is_a? Array
            temp["#{key}"] = enumerate val, key
        else

        end
    end
    return flattify temp
end

def enumerate (ary, str)
    temp = {}
    i = 1
    ary.each do |x|
        temp["#{str}#{i}"] = x
        i += 1
    end
    return flattify temp
end

回答1:

Interesting question!

Theory

Here's a recursive method to parse your data.

  • It keeps track of which keys and indices it has found.
  • It appends them in a tmp array.
  • Once a leaf object has been found, it gets written in a hash as value, with a joined tmp as key.
  • This small hash then gets recursively merged back to the main hash.

Code

def recursive_parsing(object, tmp = [])
  case object
  when Array
    object.each.with_index(1).with_object({}) do |(element, i), result|
      result.merge! recursive_parsing(element, tmp + [i])
    end
  when Hash
    object.each_with_object({}) do |(key, value), result|
      result.merge! recursive_parsing(value, tmp + [key])
    end
  else
    { tmp.join('_') => object }
  end
end

As an example:

require 'pp'
pp recursive_parsing(data)
# {"Name"=>"Kim Kones",
#  "License Number"=>"54321",
#  "Details_Name"=>"Kones, Kim",
#  "Details_Licenses_1_License Type"=>"PT",
#  "Details_Licenses_1_License Number"=>"54321",
#  "Details_Licenses_2_License Type"=>"Temp",
#  "Details_Licenses_2_License Number"=>"T123",
#  "Details_Licenses_3_License Type"=>"AP",
#  "Details_Licenses_3_License Number"=>"A666",
#  "Details_Licenses_3_Expiration Date"=>"12/31/2020"}

Debugging

Here's a modified version with old-school debugging. It might help you understand what's going on:

def recursive_parsing(object, tmp = [], indent="")
  puts "#{indent}Parsing #{object.inspect}, with tmp=#{tmp.inspect}"
  result = case object
  when Array
    puts "#{indent} It's an array! Let's parse every element:"
    object.each_with_object({}).with_index(1) do |(element, result), i|
      result.merge! recursive_parsing(element, tmp + [i], indent + "  ")
    end
  when Hash
    puts "#{indent} It's a hash! Let's parse every key,value pair:"
    object.each_with_object({}) do |(key, value), result|
      result.merge! recursive_parsing(value, tmp + [key], indent + "  ")
    end
  else
    puts "#{indent} It's a leaf! Let's return a hash"
    { tmp.join('_') => object }
  end
  puts "#{indent} Returning #{result.inspect}\n"
  result
end

When called with recursive_parsing([{a: 'foo', b: 'bar'}, {c: 'baz'}]), it displays:

Parsing [{:a=>"foo", :b=>"bar"}, {:c=>"baz"}], with tmp=[]
 It's an array! Let's parse every element:
  Parsing {:a=>"foo", :b=>"bar"}, with tmp=[1]
   It's a hash! Let's parse every key,value pair:
    Parsing "foo", with tmp=[1, :a]
     It's a leaf! Let's return a hash
     Returning {"1_a"=>"foo"}
    Parsing "bar", with tmp=[1, :b]
     It's a leaf! Let's return a hash
     Returning {"1_b"=>"bar"}
   Returning {"1_a"=>"foo", "1_b"=>"bar"}
  Parsing {:c=>"baz"}, with tmp=[2]
   It's a hash! Let's parse every key,value pair:
    Parsing "baz", with tmp=[2, :c]
     It's a leaf! Let's return a hash
     Returning {"2_c"=>"baz"}
   Returning {"2_c"=>"baz"}
 Returning {"1_a"=>"foo", "1_b"=>"bar", "2_c"=>"baz"}


回答2:

Unlike the others, I have no love for each_with_object :-). But I do like passing a single result hash around so I don't have to merge and remerge hashes over and over again.

def flattify(value, result = {}, path = [])
  case value
  when Array
    value.each.with_index(1) do |v, i|
      flattify(v, result, path + [i])
    end
  when Hash
    value.each do |k, v|
      flattify(v, result, path + [k])
    end
  else
    result[path.join("_")] = value
  end
  result
end

(Some details adopted from Eric, see comments)



回答3:

Non-recursive approach, using BFS with an array as a queue. I keep the key-value pairs where the value isn't an array/hash, and push array/hash contents to the queue (with combined keys). Turning arrays into hashes (["a", "b"]{1=>"a", 2=>"b"}) as that felt neat.

def flattify(hash)
  (q = hash.to_a).select { |key, value|
    value = (1..value.size).zip(value).to_h if value.is_a? Array
    !value.is_a?(Hash) || !value.each { |k, v| q << ["#{key}_#{k}", v] }
  }.to_h
end

One thing I like about it is the nice combination of keys as "#{key}_#{k}". In my other solution, I could've also used a string path = '' and extended that with path + "_" + k, but that would've caused a leading underscore that I'd have to avoid or trim with extra code.