compare array of hashes and print expected & actua

2019-07-15 04:43发布

问题:

I have 2 array of hashes:

actual = [{"column_name"=>"NONINTERESTINCOME", "column_data_type"=>"NUMBER"},
 {"column_name"=>"NONINTERESTEXPENSE", "column_data_type"=>"VARCHAR"},
 {"column_name"=>"TRANSACTIONDATE", "column_data_type"=>"TIMESTAMP"},
 {"column_name"=>"UPDATEDATE", "column_data_type"=>"TIMESTAMP"}]
expected = [{"column_name"=>"NONINTERESTINCOME", "column_data_type"=>"NUMBER"},
 {"column_name"=>"NONINTERESTEXPENSE", "column_data_type"=>"NUMBER"},
 {"column_name"=>"TRANSACTIONDATE", "column_data_type"=>"NUMBER"},
 {"column_name"=>"UPDATEDATE", "column_data_type"=>"TIMESTAMP"}]

I need to compare these 2 hashes and find out the ones for which the column_data_type differs.

to compare we can directly use:

diff = actual -   expected

This will print the output as:

{"column_name"=>"NONINTERESTEXPENSE", "column_data_type"=>"VARCHAR"}
{"column_name"=>"TRANSACTIONDATE", "column_data_type"=>"TIMESTAMP"}

My expected output is that in the result i want to print the actual and expected datatype, means the datatypes for the missing `column_name' from both the actual and expected array of hashes, something like:

{"column_name"=>"NONINTERESTEXPENSE", "expected_column_data_type"=>"NUMBER", "actual_column_data_type" => "VARCHAR"}
{"column_name"=>"TRANSACTIONDATE", "expected_column_data_type"=>"NUMBER","actual_column_data_type" => "TIMESTAMP" }

回答1:

(expected - actual).
  concat(actual - expected).
  group_by { |column| column['column_name'] }.
  map do |name, (expected, actual)|
    {
      'column_name'               => name,
      'expected_column_data_type' => expected['column_data_type'],
      'actual_column_data_type'   => actual['column_data_type'],
    }
  end


回答2:

This will work irrespective of order of hashes in your array.

diff = []

expected.each do |elem|
  column_name = elem['column_name']
  column_type = elem['column_data_type']
  match = actual.detect { |elem2| elem2['column_name'] == column_name  }
  if column_type != match['column_data_type']
    diff << { 'column_name' => column_name,
              'expected_column_data_type' => column_type,
              'actual_column_data_type' => match['column_data_type'] }
  end
end

p diff


回答3:

[actual, expected].map { |a| a.map(&:dup).map(&:values) }
                  .map(&Hash.method(:[]))
                  .reduce do |actual, expected|
                    actual.merge(expected) do |k, o, n|
                      o == n ? nil : {name: k, actual: o, expected: n}
                    end
                  end.values.compact

#⇒ [
#    [0] {
#            :name => "NONINTERESTEXPENSE",
#          :actual => "VARCHAR",
#        :expected => "NUMBER"
#    },
#    [1] {
#            :name => "TRANSACTIONDATE",
#          :actual => "TIMESTAMP",
#        :expected => "NUMBER"
#    }
# ]

The method above easily expandable to merge N arrays (use reduce.with_index and merge with key "value_from_#{idx}".)



回答4:

What about this?

def select(hashes_array, column_name)
  hashes_array.select { |h| h["column_name"] == column_name }.first
end

diff = (expected - actual).map do |h|
  {
    "column_name" => h["column_name"],
    "expected_column_data_type" => select(expected, h["column_name"])["column_data_type"],
    "actual_column_data_type" => select(actual, h["column_name"])["column_data_type"],
  }
end

PS: surely this code can be improved to look more elegant



回答5:

Code

def convert(actual, expected)
  hashify(actual-expected, "actual_data_type").
  merge(hashify(expected-actual, "expected_data_type")) { |_,a,e| a.merge(e) }.values
end

def hashify(arr, key)
  arr.each_with_object({}) { |g,h| h[g["column_name"]] =
    { "column_name"=>g["column_name"], key=>g["column_data_type"] } }
end

Example

actual = [
  {"column_name"=>"TRANSACTIONDATE", "column_data_type"=>"TIMESTAMP"},
  {"column_name"=>"NONINTERESTEXPENSE", "column_data_type"=>"VARCHAR"},
  {"column_name"=>"NONINTERESTINCOME", "column_data_type"=>"NUMBER"},
  {"column_name"=>"UPDATEDATE", "column_data_type"=>"TIMESTAMP"}
]

expected = [
  {"column_name"=>"NONINTERESTINCOME", "column_data_type"=>"NUMBER"},
  {"column_name"=>"NONINTERESTEXPENSE", "column_data_type"=>"NUMBER"},
  {"column_name"=>"TRANSACTIONDATE", "column_data_type"=>"NUMBER"},
  {"column_name"=>"UPDATEDATE", "column_data_type"=>"TIMESTAMP"}
]

convert(actual, expected)
  #=> [{"column_name"=>"TRANSACTIONDATE",
  #     "actual_data_type"=>"TIMESTAMP", "expected_data_type"=>"NUMBER"},
  #    {"column_name"=>"NONINTERESTEXPENSE",
  #     "actual_data_type"=>"VARCHAR", "expected_data_type"=>"NUMBER"}] 

Explanation

For the example above the steps are as follows.

First hashify actual and expected.

f = actual-expected
  #=> [{"column_name"=>"TRANSACTIONDATE", "column_data_type"=>"TIMESTAMP"},
  #    {"column_name"=>"NONINTERESTEXPENSE", "column_data_type"=>"VARCHAR"}]

g = hashify(f, "actual_data_type")
  #=> {"TRANSACTIONDATE"=>{"column_name"=>"TRANSACTIONDATE",
  #      "actual_data_type"=>"TIMESTAMP"},
  #    "NONINTERESTEXPENSE"=>{ "column_name"=>"NONINTERESTEXPENSE",
  #      "actual_data_type"=>"VARCHAR"}}

h = expected-actual
  #=> [{"column_name"=>"NONINTERESTEXPENSE", "column_data_type"=>"NUMBER"},
  #    {"column_name"=>"TRANSACTIONDATE", "column_data_type"=>"NUMBER"}]

i = hashify(h, "expected_data_type")
  #=> {"NONINTERESTEXPENSE"=>{"column_name"=>"NONINTERESTEXPENSE",
  #      "expected_data_type"=>"NUMBER"},
  #    "TRANSACTIONDATE"=>{"column_name"=>"TRANSACTIONDATE",
  #      "expected_data_type"=>"NUMBER"}}

Next merge g and i using the form of Hash#merge that employs a block to determine the values of keys that are present in both hashes being merged. See the doc for the definitions of the three block variables (the first of which, the common key, I've represented by an underscore to signify that it is not used in the block calculation).

j = g.merge(i) { |_,a,e| a.merge(e) }
  #=> {"TRANSACTIONDATE"=>{"column_name"=>"TRANSACTIONDATE",
  #      "actual_data_type"=>"TIMESTAMP", "expected_data_type"=>"NUMBER"},
  #    "NONINTERESTEXPENSE"=>{"column_name"=>"NONINTERESTEXPENSE",
  #      "actual_data_type"=>"VARCHAR", "expected_data_type"=>"NUMBER"}}

Lastly, drop the keys.

k = j.values
  #=> [{"column_name"=>"TRANSACTIONDATE", "actual_data_type"=>"TIMESTAMP",
  #     "expected_data_type"=>"NUMBER"},
  #    {"column_name"=>"NONINTERESTEXPENSE", "actual_data_type"=>"VARCHAR",
  #     "expected_data_type"=>"NUMBER"}]


标签: ruby hash